Network World Clear Choice Test: IPS Performance

Published in Network World, September 11, 2006
Test Methodology

Version 2.01: 2006091101. Copyright 2005-2006 by Network Test Inc. and Opus One. Vendors may comment on this document and any other aspect of test methodology. Network Test and Opus One reserve the right to change test parameters at any time.

1 Executive Summary

This document describes benchmarking procedures for intrusion prevention systems (IPSs). Test results are scheduled for publication in Network World.

These tests assess IPSs along three dimensions:

Performance

Performance describes an IPS’s ability to inspect, analyze, alert, and filter traffic at a given rate and/or with a given response time. High-end IPSs currently claim to inspect traffic at multiple gigabits per second; accordingly, a key goal of this test is to determine maximum forwarding rates for benign traffic while concurrently blocking malicious traffic. We also plan to measure latency and/or response times.

Correctness

IPS “correctness” describes the ability to differentiate between benign and malicious traffic.

Correctness testing will assess the ability of an IPS to block a large number of attacks, some well-known and others less so, offered in the midst of a high-speed data stream. We describe attacks in broad terms in the “Attack Types” section below, but we do not enumerate the specific attacks we plan to use.

Completeness

IPS "completeness" assesses how completely the IPS fulfills its job. We will test this in two ways. Objectively, we will evaluate whether the IPS blocks the attacks that we send it. Subjectively, we will assess the quality of information

given a network ad ministrator about attempted attacks. An IPS not only should block recognized attacks, but also provide network administrators with useful information about the attack.

Completeness tests, conducted in conjunction with correctness tests, assess:

--the ability of the IPS to continue alerting while under attack

--the quality of attack descriptions for a given attack (such as the type of exploit or flood and its origin(s) and target node(s)

--the quality of other information provided about the attack, such as recommended workarounds, patches, or preventive actions

Signature-based vs. anomaly detection IPSs

This methodology takes a black-box approach to IPS design, relying only on externally observable phenomena such as measurements of traffic rates and malicious traffic traversal. As such, we do not distinguish between IPSs based on signature-based or anomaly detection designs. For anomaly detection systems, we will stage equipment for a reasonable period by offering “normal” traffic (which may include some percentage of malicious traffic) prior to beginning our tests.

These tests are not appropriate for IPS devices that only do rate-based intrusion prevention. However, some of our tests will evaluate an IPS’s ability to block attacks based on volume, such as SYN floods.

Vendor on-site participation

We generally discourage on-site visits from vendors’ engineers during these tests. A visit for initial device configuration is acceptable, but only if it is standard practice for all customers. As noted, we are not disclosing the majority of attacks to be used, especially for the completeness and correctness tests. Vendors are not permitted on-site while we conduct the completeness and correctness tests.

1.1 Organization of this document

This document is organized as follows. This section introduces the tests to be conducted. Section 2 describes the test bed. Section 3 describes the tests to be performed. Section 4 provides a change log.

2 The Test Bed

2.1 Device Under Test

Vendors should submit devices that have at least three gigabit Ethernet interfaces (two for monitoring traffic and one for management). Performance numbers will be reported with pricing, so there is an advantage to submitting a product suited to this configuration.

We also can test products with more interfaces. Vendors who wish to submit devices with more interfaces should contact us to discuss this.

Devices must have the ability to transparently bridge (not route) traffic between monitoring interfaces.

There are more details about device configuration later in this section.

Also, as described separately in the test invitation, vendors MUST supply the official name of the product, the version tested, the price as tested, and if there are any additional hardware or software, the nomenclature and pricing for these options. Vendors MUST supply this additional information before testing begins.

2.2 Test Tools

The primary attack generation tool for this project is the ThreatEx appliance from Imperfect Networks. We also use the Avalanche and Reflector traffic generators from Spirent Communications to emulate HTTP and HTTPS clients and servers.

We may also augment these tools with a collection of homegrown attack and vulnerability assessment packages.

More information about ThreatEx is available here:

http://www.imperfectnetworks.com/?page=solutions

More information about Avalanche and Reflector is available here:

http://www.spirentcom.com/analysis/product_line.cfm?pl=32&wt=2

2.3 Test Environment

The test bed models the complex mix of device types found in enterprise networks. While this test focuses on network-based intrusion prevention systems, IPS vendors should configure their devices to protect mixture of network and host operating systems, including those from Microsoft, Red Hat, Sun Microsystems, Apple Computer, FreeBSD, Cisco Systems, Juniper Networks, and Check Point Software Technologies.

2.4 Test Bed Topology

2.4.1 Monitored network

The logical test bed consists of a single untagged VLAN with the device under test (DUT) operating in bridging mode. (Note: IPSes that cannot operate in bridge mode will not be tested.) A table at the end of this section lists the IP subnets on the monitored network.

On one side of the DUT are clients emulated by the ThreatEx and Avalanche appliances, while on the other side the Reflector and ThreatEx appliances emulate servers. A single Ethernet switch with copper gigabit Ethernet ports ties together all devices, including the DUT. The switch is capable of switching fully meshed traffic patterns at line rate.

The Avalanche/2500 is capable of simulating millions of concurrent web clients. However, not all DUTs will be able to handle the same amount of TCP connection concurrency. Thus, the volume of simulated of traffic should be relative to the maximum amount of concurrent transactions a given IPS can analyze without dropping frames or missing a security breach.

While the Avalanche/2500 streams traffic to the simulated web servers on the Reflector/2500, the ThreatEx/2500 will inject malicious traffic into the network in a stepped fashion in order to gage the effects of various traffic ratios (malicious:benign). The idea is to be able to identify how effective the DUT is at identifying attacks in the presence of legitimate traffic.

Given that the DUT operates in bridging mode on this network, no IP address assignment for IPS monitoring interfaces should be necessary; a separate management interface should be provided, with addressing instructions given in the next section. Vendors should not enable spanning tree on the monitoring ports, as both should be active.

The following table lists the IP subnets allocated on the monitored networks. The section on “IP access control” below covers the same information and gives access control rules for these subnets. Note that we do not enumerate specific host addresses; vendors may safely assume we will use any and all addresses available within each netblock.

IPv4 netblock	Service(s)
172.20.0.0/24	POP3, IMAP, IMAPS, SMTP
172.20.1.0/24	HTTP/HTTPS IIS servers
172.20.2.0/24	HTTP/HTTPS Apache servers
172.20.3.0/24	Windows Media streaming servers
172.20.4.0/25	FTP servers
172.20.4.128/26	SIP gateways
172.20.4.192/26	DNS servers

2.4.2 Management Network

Vendors should configure one interface for IPS management using these parameters:

IP address: 172.16.1.X

(send email to dnewman@networktest.com for the value of X)

Subnet mask: 255.255.255.0

Default gateway: 172.16.1.254

Please disable support for IP routing, IP multicast, spanning tree, and any other extraneous traffic on the management interface.

2.5 IP Access Control

While no IP addresses should be necessary on the (bridged) monitoring interfaces of the IPS, the device nonetheless needs to be told which networks and hosts it monitors. This section provides IP address access control information.

The “outside” side of the IPS should be configured to be installed directly inside of a firewall facing the public Internet. The “inside” side of the IPS should be configured to protect the inside of the network as defined by the firewall rulebase. The firewall rulebase is:

Permit any-outside 172.20.0.0/24 POP3, IMAP, IMAPS, SMTP

(TCP ports 25, 110, 143, 993)

Permit any-outside 172.20.1.0/24 HTTP/HTTPS IIS servers (TCP ports 80, 443)

Permit any-outside 172.20.2.0/24 HTTP/HTTPS Apache servers (TCP ports 80, 443)

Permit any-outside 172.20.3.0/24 Windows Media streaming servers (UDP port 1755)

Permit any-outside 172.20.4.0/25 FTP servers (TCP ports 20, 21)

Permit any-outside 172.20.4.128/26 SIP gateways (UDP port 5060)

Permit any-outside 172.20.4.192/26 DNS servers (TCP and UDP port 53)

Permit any-inside any-outside ANY service

Deny any any any

2.6 Threat Management

This section describes in general terms the type of monitoring (be it signature-based or anomaly-based) that an IPS should perform.

Vendors should assume that servers are unpatched and thus vulnerable to well-known host OS exploits. Further, IPSs SHOULD block packets that would result in any of the following:

- system denial of service

- system compromise (root or other access)

- elevated access (i.e., directory traversal that gives access to unintended files)

IPSs SHOULD NOT block spurious attempted attacks. Examples include recon attacks (port scans, traceroute, ping) and attack attempts with no known security history.

2.7 Attack types for performance testing

We will launch just three exploits against each DUT for performance testing:

- SQL slammer (CVE 2002-0649)

- Witty worm (CVE 2004-0362)

- Cisco malformed SNMP (CVE 2004-0714)

We may also offer other well-known stateful attacks such as Code Red v2, but these are merely to populate layer-2 address tables of our switches, and not part of formal testing.

Note that we will NOT change DUT configuration between the performance and completeness/correctness portions of this test. Thus, DUTs must be configured to block both the exploits listed above, and those in the next section.

2.8 Attack types for completeness/correctness testing

On the theory that “attackers don’t make appointments,” we do not list most attacks we will launch against IPSs for the completeness/correctness portion of tests. Some will be well-known exploits, such as those described in the Mitre CVE database. Others will be variations on these and may possibly include zero-day exploits.

Network Test has divided attacks into two categories: the “80 percent” and “20 percent” groups, described below.

2.8.1 The “80 Percent” Group

Approximately 80 percent of the attacks we launch are well known. We would expect all IPSs to identify and stop these attacks. Attacks in this group include:

1. Attacks that exploit vulnerabilities in a well-known service such as HTTP, SMTP, IMAP, or POP3.

2. Virii, worms, or DoS attacks created at least two weeks prior to the timestamp of the IPS’s attack recognition library

3. Attacks that result in root compromise of hosts or network elements protected by the IPS.

2.8.2 The “20 Percent” Group

Approximately 20 percent of our attacks are not widely known, and thus might not be detected or blocked by the IPS. These include:

1. Attacks that compromise, circumvent, or interrupt the service of the IPS.

2. Attacks that traverse to a “protected” network by obfuscating packet contents (for example by using fragmentation or variations in exploit payload)

3. Attacks that exploit vulnerabilities in security applications, such as sniffers and traffic analyzers

4. Attacks that exploit vulnerabilities in outdated or incorrectly implemented protocol stacks (such as malformed IP packets, spoofed ARP requests or responses, or PUSH/ACK floods)

5. Any monitoring that attempts to go undetected.

3 Test Procedures

3.1 Performance

3.1.1 Objective

To determine the maximum forwarding rate (RFC 2285, 2889) at which an IPS can inspect, analyze, alert, and filter stateful traffic

To determine application response time while forwarding stateful traffic at the maximum forwarding rate

To determine throughput (RFC 1242, 2544) at which an IPS can inspect, analyze, alert, and filter stateless traffic

To determine the latency while forwarding stateless traffic at the throughput rate

3.1.2 Test Bed Configuration

The DUT must be configured to bridge traffic between interfaces. The DUT must be configured to perform real-time alerts. The DUT should be configured to disable all extraneous management traffic, such as spanning tree messages, OSPF HELLO messages, IGMP messages, or any other traffic that might degrade the DUT’s forwarding rate. The DUT may NOT be configured to optimize performance at the expense of “best practices” logging and auditing, for example by disabling logging.

For stateful tests, Avalanche 2500 appliances are configured to request HTTP, FTP, SMTP, POP3, and DNS traffic. Some 1500 simulated clients on Avalanche request 11-kbyte objects over HTTP, 5-Mbyte objects over FTP, and 50-kbyte objects over POP3 and SMTP from 16 virtual IIS servers running on Reflector 2500. Two pairs of Avalanche and Reflector appliances can achieve sustained goodput[1] of 3.8 Gbit/s. This is nearly double the capacity of IPS devices with two monitoring interfaces, as described in Section 2.

For stateless tests, a Spirent SmartBits running TRT Interactive will offer UDP traffic in a “port-pair” topology between each pair of DUT interfaces. Frames are 64, 512, and 1518 bytes long, offered in separate tests. The traffic uses 254 source IP addresses, all targeting a single destination host. The UDP traffic uses source port 1025 and destination port 1026 in all cases. The payload of each packet is random. The SmartBits runs at line rate with measured latency of 100 nanoseconds or less.

As noted, we also can test devices with more monitoring interfaces; PLEASE ADVISE ASAP IF YOUR DEVICE ACHIEVES GOODPUT ABOVE 3.8 GBIT/S.

3.1.3 Procedure

1. For stateful traffic Avalanche (emulated clients) and Reflector (emulated servers) appliances begin with a baseline consisting solely of benign traffic – a mixture of HTTP, FTP, POP3, SMTP, and DNS. We measure goodput during a steady-state period of at least 60 seconds. This test is run on a single port-pair.

2. Once the baseline test has been run, we calculate the DUT’s maximum forwarding rate by adding together the average incoming and outgoing packet-per-second rates during the steady-state phase of testing.

3. We repeat the same test as in the first step, but this time offer attack traffic (as described in section 2.7) at 1 percent of the aggregate pps rate described in step 2. We clear all attack alerts on the DUT between iterations, if the DUT allows this.

4. We repeat the test twice more with attack traffic offered at 4 percent and 16 percent of the aggregate pps rate described in step 2.

5. We record alerts and timestamps for all attacks seen by the DUT. We also determine whether the DUT forwarded attack(s), and if so which ones.

6. We repeat steps 1-5 for DUTs supporting more than one port pair (for example, two-pair and four-pair testing).

7. For stateless testing, we configure TRT on SmartBits to determine the throughput level for 64-, 512-, and 1,518-byte frames. TRT also records average latency at the throughput level. Test duration is 300 seconds. We conduct this test on a single port pair.

7. For 512-byte frames, we calculate the DUT’s aggregate forwarding rate by adding together the incoming and outgoing packet-per-second rate at the throughput level.

8. We offer 512-byte UDP frames from SmartBits at the throughput rate minus 1 percent, as determined in the previous two steps. In place of the “missing” 1 percent, we offer attack traffic from ThreatEx at 1 percent of the aggregate forwarding rate described in step 7.

9. We repeat the previous steps with benign traffic rates reduced by 4 percent and 16 percent, respectively. We offer attack traffic at these rates.

10. For each iteration, we record aggregate forwarding rate, latency, and whether the DUT forwarded any attack traffic.

11. We repeat steps 7-10 for DUTs supporting more than one port pair (for example, two-pair and four-pair testing).

3.1.4 Metrics

1. Maximum goodput rate

2. Page response time

3. Throughput (64, 512-, 1518-byte frames) of benign UDP traffic

4. Latency (64, 512-, 1518-byte frames) of benign UDP traffic

5. Forwarding rate (512-byte frames) of UDP traffic + attack traffic at 1, 4, 16 percent of throughput rate

6. Latency (512-byte frames) of UDP traffic + attack traffic at 1, 4, 16 percent of throughput rate

Note: In addition to correlating DUT alerts with the attack quantities configured on the ThreatEx appliance, we also analyze packet captures to determine whether malicious traffic was not only recognized but also prevented from traversing the DUT.

3.2 Correctness and Completeness

3.2.1 Objectives

Correctness: To assess an IPS’s ability to identify and block attacks using various proportions of benign and malicious traffic. Not every IPS is expected to block every attack. However, for any attack that the IPS does block, we expect it to block every instance of that attack that it sees. This test will validate whether the IPS can be relied on to block all supported attacks at all traffic levels.

Completeness: To assess the ability of the IPS to block attacks from a wide-ranging database of signatures. To assess the quality of reporting provided by the IPS while under attack. To assess an IPS’s ability to provide timely and useful reporting on the malicious traffic it sees.

3.2.2 Test Bed Configuration

DUT configuration is the same as the “Performance” test and may not be changed for this test.

The “background traffic” generated by the Avalanche/Reflector appliances is also the same as the “Performance” test.

HTTP traffic is sustained for at least 60 seconds before DUT streams attacks at 10 percent of maximum goodput, as determined from the performance test. We repeat these correctness tests with various combinations of benign and attack traffic -- 80:20, 70:30, 60:40, and 50:50, respectively.

3.2.3 Procedure

1. Avalanche and Reflector offer HTTP traffic at 80 percent of the maximum goodput rate (as determined in the performance test) for a steady-state period of at least 60 seconds.

2. Once simulated HTTP traffic reaches 80 percent of the maximum goodput rate, the ThreatEx applicances offer a mix of various attacks at 20 percent of the maximum goodput rate for at least 60 seconds.

All attacks belong to the “80 percent” group, as described in section 1 of this document.

4. We record alerts and timestamps for all attacks seen by the DUT. We also determine whether the DUT halts transmission of attack(s), and if so which ones.

5. Avalanche and Reflector continue to offer HTTP traffic at 80 percent of the maximum goodput rate for at least 28 seconds after ThreatEx stops transmitting attack traffic.

6. We repeat all preceding steps with different ratios of benign and attack traffic – 70:30, 60:40, and 50:50, respectively.

7. We repeat all preceding steps using attacks from the “20 percent” group.

3.2.4 Metrics

3.2.4.1 Objective Metrics

1. Correctness: For each traffic ratio, for each attack in the “80 percent” and “20 percent” group that is blocked at least once at any traffic level, the percent of the time that the attack is blocked and whether this varies based on traffic ratio. An attack is defined as “blocked” when the IPS deletes the datagram that would cause the attack to succeed from the data stream passed through the DUT.

2. Correctness: For each traffic ratio, variance in latency and packet loss of non-attack traffic compared with the baseline performance test.

3. Completeness: For the 80:20 traffic ratio, for each attack in the “80 percent” and “20 percent” group, whether the IPS blocks the attack at least once. [completeness]

4. For each traffic ratio, when an attack is actually blocked, whether the IPS provides an alert (or aggregated alert) indicating that an attack was blocked.

3.2.4.2 Subjective Metrics

1. For the alerts provided by the IPS, how well the information available in the management system would enable a trained security staff member to perform forensics analysis on the blocked attacks.

2. Ease of system configuration, such as how long it takes to a given security policy (see the firewall rule base and specific signatures given in Section 2)

4 Change History

Version 2.01

11 September 2006

Changed title to indicate publication in Network World

Version 2.0

30 May 2006

Section 2.7: Added description of exploits used in performance tests

Section 3.1: Added description of stateful and stateless test procedure

Version 1.1

14 February 2006

Section 2.4.1: Listed monitored netblocks

Section 2.4.2: Changed default gateway from 172.16.1.1 to 172.16.1.254

Version 1.0

2 February 2006

Initial public release

Version 0.5

2 February 2006

Internal release

Version 0.4

30 January 2006

Internal release

Version 0.3

25 January 2006

Internal release

Version 0.2

6 December 2005

Internal release

Version 0.1

7 November 2005

Initial internal release

[1] RFC 2647 defines goodput as the number of bits per unit of time forwarded to the correct destination interface of the DUT, minus any bits lost or retransmitted. In this context, goodput is a layer-7 measurement of HTTP bytes received, as distinguished from “throughput,” which is defined in RFC 1242 as a layer-2 measurement.