Published in Network World, September 11, 2006
Version 2.01: 2006091101. Copyright 2005-2006 by Network Test Inc. and Opus One. Vendors may comment on this document and any other aspect of test methodology. Network Test and Opus One reserve the right to change test parameters at any time.
This document describes benchmarking procedures for intrusion prevention systems (IPSs). Test results are scheduled for publication in Network World.
These tests assess IPSs along three dimensions:
Performance describes an IPS’s ability to inspect, analyze, alert, and filter traffic at a given rate and/or with a given response time. High-end IPSs currently claim to inspect traffic at multiple gigabits per second; accordingly, a key goal of this test is to determine maximum forwarding rates for benign traffic while concurrently blocking malicious traffic. We also plan to measure latency and/or response times.
IPS “correctness” describes the ability to differentiate between benign and malicious traffic.
Correctness testing will assess the ability of an IPS to block a large number of attacks, some well-known and others less so, offered in the midst of a high-speed data stream. We describe attacks in broad terms in the “Attack Types” section below, but we do not enumerate the specific attacks we plan to use.
IPS "completeness" assesses how completely the IPS fulfills its job. We will test this in two ways. Objectively, we will evaluate whether the IPS blocks the attacks that we send it. Subjectively, we will assess the quality of information
given a network ad ministrator about attempted attacks. An IPS not only should block recognized attacks, but also provide network administrators with useful information about the attack.
Completeness tests, conducted in conjunction with correctness tests, assess:
--the ability of the IPS to continue alerting while under attack
--the quality of attack descriptions for a given attack (such as the type of exploit or flood and its origin(s) and target node(s)
--the quality of other information provided about the attack, such as recommended workarounds, patches, or preventive actions
Signature-based vs. anomaly detection IPSs
This methodology takes a black-box approach to IPS design, relying only on externally observable phenomena such as measurements of traffic rates and malicious traffic traversal. As such, we do not distinguish between IPSs based on signature-based or anomaly detection designs. For anomaly detection systems, we will stage equipment for a reasonable period by offering “normal” traffic (which may include some percentage of malicious traffic) prior to beginning our tests.
These tests are not appropriate for IPS devices that only do rate-based intrusion prevention. However, some of our tests will evaluate an IPS’s ability to block attacks based on volume, such as SYN floods.
Vendor on-site participation
We generally discourage on-site visits from vendors’ engineers during these tests. A visit for initial device configuration is acceptable, but only if it is standard practice for all customers. As noted, we are not disclosing the majority of attacks to be used, especially for the completeness and correctness tests. Vendors are not permitted on-site while we conduct the completeness and correctness tests.
This document is organized as follows. This section introduces the tests to be conducted. Section 2 describes the test bed. Section 3 describes the tests to be performed. Section 4 provides a change log.
Vendors should submit devices that have at least three gigabit Ethernet interfaces (two for monitoring traffic and one for management). Performance numbers will be reported with pricing, so there is an advantage to submitting a product suited to this configuration.
We also can test products with more interfaces. Vendors who wish to submit devices with more interfaces should contact us to discuss this.
Devices must have the ability to transparently bridge (not route) traffic between monitoring interfaces.
There are more details about device configuration later in this section.
Also, as described separately in the test invitation, vendors MUST supply the official name of the product, the version tested, the price as tested, and if there are any additional hardware or software, the nomenclature and pricing for these options. Vendors MUST supply this additional information before testing begins.
The primary attack generation tool for this project is the ThreatEx appliance from Imperfect Networks. We also use the Avalanche and Reflector traffic generators from Spirent Communications to emulate HTTP and HTTPS clients and servers.
We may also augment these tools with a collection of homegrown attack and vulnerability assessment packages.
More information about ThreatEx is available here:
More information about Avalanche and Reflector is available here:
The test bed models the complex mix of device types found in enterprise networks. While this test focuses on network-based intrusion prevention systems, IPS vendors should configure their devices to protect mixture of network and host operating systems, including those from Microsoft, Red Hat, Sun Microsystems, Apple Computer, FreeBSD, Cisco Systems, Juniper Networks, and Check Point Software Technologies.
The logical test bed consists of a single untagged VLAN with the device under test (DUT) operating in bridging mode. (Note: IPSes that cannot operate in bridge mode will not be tested.) A table at the end of this section lists the IP subnets on the monitored network.
On one side of the DUT are clients emulated by the ThreatEx and Avalanche appliances, while on the other side the Reflector and ThreatEx appliances emulate servers. A single Ethernet switch with copper gigabit Ethernet ports ties together all devices, including the DUT. The switch is capable of switching fully meshed traffic patterns at line rate.
The Avalanche/2500 is capable of simulating millions of concurrent web clients. However, not all DUTs will be able to handle the same amount of TCP connection concurrency. Thus, the volume of simulated of traffic should be relative to the maximum amount of concurrent transactions a given IPS can analyze without dropping frames or missing a security breach.
While the Avalanche/2500 streams traffic to the simulated web servers on the Reflector/2500, the ThreatEx/2500 will inject malicious traffic into the network in a stepped fashion in order to gage the effects of various traffic ratios (malicious:benign). The idea is to be able to identify how effective the DUT is at identifying attacks in the presence of legitimate traffic.
Given that the DUT operates in bridging mode on this network, no IP address assignment for IPS monitoring interfaces should be necessary; a separate management interface should be provided, with addressing instructions given in the next section. Vendors should not enable spanning tree on the monitoring ports, as both should be active.
The following table lists the IP subnets allocated on the monitored networks. The section on “IP access control” below covers the same information and gives access control rules for these subnets. Note that we do not enumerate specific host addresses; vendors may safely assume we will use any and all addresses available within each netblock.
POP3, IMAP, IMAPS, SMTP
HTTP/HTTPS IIS servers
HTTP/HTTPS Apache servers
Windows Media streaming servers
Vendors should configure one interface for IPS management using these parameters:
IP address: 172.16.1.X
(send email to email@example.com for the value of X)
Subnet mask: 255.255.255.0
Default gateway: 172.16.1.254
Please disable support for IP routing, IP multicast, spanning tree, and any other extraneous traffic on the management interface.
While no IP addresses should be necessary on the (bridged) monitoring interfaces of the IPS, the device nonetheless needs to be told which networks and hosts it monitors. This section provides IP address access control information.
The “outside” side of the IPS should be configured to be installed directly inside of a firewall facing the public Internet. The “inside” side of the IPS should be configured to protect the inside of the network as defined by the firewall rulebase. The firewall rulebase is:
Permit any-outside 172.20.0.0/24 POP3, IMAP, IMAPS, SMTP
(TCP ports 25, 110, 143, 993)
Permit any-outside 172.20.1.0/24 HTTP/HTTPS IIS servers (TCP ports 80, 443)
Permit any-outside 172.20.2.0/24 HTTP/HTTPS Apache servers (TCP ports 80, 443)
Permit any-outside 172.20.3.0/24 Windows Media streaming servers (UDP port 1755)
Permit any-outside 172.20.4.0/25 FTP servers (TCP ports 20, 21)
Permit any-outside 172.20.4.128/26 SIP gateways (UDP port 5060)
Permit any-outside 172.20.4.192/26 DNS servers (TCP and UDP port 53)
Permit any-inside any-outside ANY service
Deny any any any
This section describes in general terms the type of monitoring (be it signature-based or anomaly-based) that an IPS should perform.
Vendors should assume that servers are unpatched and thus vulnerable to well-known host OS exploits. Further, IPSs SHOULD block packets that would result in any of the following:
- system denial of service
- system compromise (root or other access)
- elevated access (i.e., directory traversal that gives access to unintended files)
IPSs SHOULD NOT block spurious attempted attacks. Examples include recon attacks (port scans, traceroute, ping) and attack attempts with no known security history.
We will launch just three exploits against each DUT for performance testing:
- SQL slammer (CVE 2002-0649)
- Witty worm (CVE 2004-0362)
- Cisco malformed SNMP (CVE 2004-0714)
We may also offer other well-known stateful attacks such as Code Red v2, but these are merely to populate layer-2 address tables of our switches, and not part of formal testing.
Note that we will NOT change DUT configuration between the performance and completeness/correctness portions of this test. Thus, DUTs must be configured to block both the exploits listed above, and those in the next section.
On the theory that “attackers don’t make appointments,” we do not list most attacks we will launch against IPSs for the completeness/correctness portion of tests. Some will be well-known exploits, such as those described in the Mitre CVE database. Others will be variations on these and may possibly include zero-day exploits.
Network Test has divided attacks into two categories: the “80 percent” and “20 percent” groups, described below.
Approximately 80 percent of the attacks we launch are well known. We would expect all IPSs to identify and stop these attacks. Attacks in this group include:
1. Attacks that exploit
vulnerabilities in a well-known service such as HTTP, SMTP, IMAP, or POP3.
2. Virii, worms, or DoS
attacks created at least two weeks prior to the timestamp of the IPS’s attack
3. Attacks that result in root compromise of hosts or network elements protected by the IPS.
Approximately 20 percent of our attacks are not widely known, and thus might not be detected or blocked by the IPS. These include:
1. Attacks that compromise, circumvent, or interrupt the service of the IPS.
2. Attacks that traverse to
a “protected” network by
obfuscating packet contents (for example by using fragmentation or variations
in exploit payload)
3. Attacks that exploit vulnerabilities in security applications, such as sniffers and traffic analyzers
4. Attacks that exploit vulnerabilities in outdated or incorrectly implemented protocol stacks (such as malformed IP packets, spoofed ARP requests or responses, or PUSH/ACK floods)
5. Any monitoring that attempts to go undetected.
To determine the maximum forwarding rate (RFC 2285, 2889) at which an IPS can inspect, analyze, alert, and filter stateful traffic
To determine application response time while forwarding stateful traffic at the maximum forwarding rate
To determine throughput (RFC 1242, 2544) at which an IPS can inspect, analyze, alert, and filter stateless traffic
To determine the latency while forwarding stateless traffic at the throughput rate
The DUT must be configured to bridge traffic between interfaces. The DUT must be configured to perform real-time alerts. The DUT should be configured to disable all extraneous management traffic, such as spanning tree messages, OSPF HELLO messages, IGMP messages, or any other traffic that might degrade the DUT’s forwarding rate. The DUT may NOT be configured to optimize performance at the expense of “best practices” logging and auditing, for example by disabling logging.
For stateful tests, Avalanche 2500 appliances are configured to request HTTP, FTP, SMTP, POP3, and DNS traffic. Some 1500 simulated clients on Avalanche request 11-kbyte objects over HTTP, 5-Mbyte objects over FTP, and 50-kbyte objects over POP3 and SMTP from 16 virtual IIS servers running on Reflector 2500. Two pairs of Avalanche and Reflector appliances can achieve sustained goodput of 3.8 Gbit/s. This is nearly double the capacity of IPS devices with two monitoring interfaces, as described in Section 2.
For stateless tests, a Spirent SmartBits running TRT Interactive will offer UDP traffic in a “port-pair” topology between each pair of DUT interfaces. Frames are 64, 512, and 1518 bytes long, offered in separate tests. The traffic uses 254 source IP addresses, all targeting a single destination host. The UDP traffic uses source port 1025 and destination port 1026 in all cases. The payload of each packet is random. The SmartBits runs at line rate with measured latency of 100 nanoseconds or less.
As noted, we also can test devices with more monitoring interfaces; PLEASE ADVISE ASAP IF YOUR DEVICE ACHIEVES GOODPUT ABOVE 3.8 GBIT/S.
1. For stateful traffic
Avalanche (emulated clients) and Reflector (emulated servers) appliances begin
with a baseline consisting solely of benign traffic – a mixture of HTTP,
FTP, POP3, SMTP, and DNS. We measure goodput during a steady-state period of at
least 60 seconds. This test is run on a single port-pair.
2. Once the baseline test has been run, we calculate the DUT’s maximum forwarding rate by adding together the average incoming and outgoing packet-per-second rates during the steady-state phase of testing.
3. We repeat the same test as in the first step, but this time offer attack traffic (as described in section 2.7) at 1 percent of the aggregate pps rate described in step 2. We clear all attack alerts on the DUT between iterations, if the DUT allows this.
4. We repeat the test twice more with attack traffic offered at 4 percent and 16 percent of the aggregate pps rate described in step 2.
5. We record alerts and timestamps for all attacks seen by the DUT. We also determine whether the DUT forwarded attack(s), and if so which ones.
6. We repeat steps 1-5 for DUTs supporting more than one port pair (for example, two-pair and four-pair testing).
7. For stateless testing, we configure TRT on SmartBits to determine the throughput level for 64-, 512-, and 1,518-byte frames. TRT also records average latency at the throughput level. Test duration is 300 seconds. We conduct this test on a single port pair.
7. For 512-byte frames, we calculate the DUT’s aggregate forwarding rate by adding together the incoming and outgoing packet-per-second rate at the throughput level.
8. We offer 512-byte UDP frames from SmartBits at the throughput rate minus 1 percent, as determined in the previous two steps. In place of the “missing” 1 percent, we offer attack traffic from ThreatEx at 1 percent of the aggregate forwarding rate described in step 7.
9. We repeat the previous steps with benign traffic rates reduced by 4 percent and 16 percent, respectively. We offer attack traffic at these rates.
10. For each iteration, we record aggregate forwarding rate, latency, and whether the DUT forwarded any attack traffic.
11. We repeat steps 7-10 for DUTs supporting more than one port pair (for example, two-pair and four-pair testing).
1. Maximum goodput rate
2. Page response time
3. Throughput (64, 512-, 1518-byte frames) of benign UDP traffic
4. Latency (64, 512-, 1518-byte frames) of benign UDP traffic
5. Forwarding rate (512-byte frames) of UDP traffic + attack traffic at 1, 4, 16 percent of throughput rate
6. Latency (512-byte frames) of UDP traffic + attack traffic at 1, 4, 16 percent of throughput rate
Note: In addition to correlating DUT alerts with the attack quantities configured on the ThreatEx appliance, we also analyze packet captures to determine whether malicious traffic was not only recognized but also prevented from traversing the DUT.
Correctness: To assess an IPS’s ability to identify and block attacks using various proportions of benign and malicious traffic. Not every IPS is expected to block every attack. However, for any attack that the IPS does block, we expect it to block every instance of that attack that it sees. This test will validate whether the IPS can be relied on to block all supported attacks at all traffic levels.
Completeness: To assess the ability of the IPS to block attacks from a wide-ranging database of signatures. To assess the quality of reporting provided by the IPS while under attack. To assess an IPS’s ability to provide timely and useful reporting on the malicious traffic it sees.
DUT configuration is the same as the “Performance” test and may not be changed for this test.
The “background traffic” generated by the Avalanche/Reflector appliances is also the same as the “Performance” test.
HTTP traffic is sustained for at least 60 seconds before DUT streams attacks at 10 percent of maximum goodput, as determined from the performance test. We repeat these correctness tests with various combinations of benign and attack traffic -- 80:20, 70:30, 60:40, and 50:50, respectively.
1. Avalanche and Reflector
offer HTTP traffic at 80 percent of the maximum goodput rate (as determined in
the performance test) for a steady-state period of at least 60 seconds.
2. Once simulated HTTP traffic reaches 80 percent of the maximum goodput rate, the ThreatEx applicances offer a mix of various attacks at 20 percent of the maximum goodput rate for at least 60 seconds.
All attacks belong to the “80 percent” group, as described in section 1 of this document.
4. We record alerts and timestamps for all attacks seen by the DUT. We also determine whether the DUT halts transmission of attack(s), and if so which ones.
5. Avalanche and Reflector continue to offer HTTP traffic at 80 percent of the maximum goodput rate for at least 28 seconds after ThreatEx stops transmitting attack traffic.
6. We repeat all preceding steps with different ratios of benign and attack traffic – 70:30, 60:40, and 50:50, respectively.
7. We repeat all preceding steps using attacks from the “20 percent” group.
1. Correctness: For each traffic ratio, for each attack in the “80 percent” and “20 percent” group that is blocked at least once at any traffic level, the percent of the time that the attack is blocked and whether this varies based on traffic ratio. An attack is defined as “blocked” when the IPS deletes the datagram that would cause the attack to succeed from the data stream passed through the DUT.
2. Correctness: For each traffic ratio, variance in latency and packet loss of non-attack traffic compared with the baseline performance test.
3. Completeness: For the 80:20 traffic ratio, for each attack in the “80 percent” and “20 percent” group, whether the IPS blocks the attack at least once. [completeness]
4. For each traffic ratio, when an attack is actually blocked, whether the IPS provides an alert (or aggregated alert) indicating that an attack was blocked.
1. For the alerts provided by the IPS, how well the information available in the management system would enable a trained security staff member to perform forensics analysis on the blocked attacks.
2. Ease of system configuration, such as how long it takes to a given security policy (see the firewall rule base and specific signatures given in Section 2)
11 September 2006
Changed title to indicate publication in Network World
30 May 2006
Section 2.7: Added description of exploits used in performance tests
Section 3.1: Added description of stateful and stateless test procedure
14 February 2006
Section 2.4.1: Listed monitored netblocks
Section 2.4.2: Changed default gateway from 172.16.1.1 to 172.16.1.254
2 February 2006
Initial public release
2 February 2006
30 January 2006
25 January 2006
6 December 2005
7 November 2005
Initial internal release
 RFC 2647 defines goodput as the number of bits per unit of time forwarded to the correct destination interface of the DUT, minus any bits lost or retransmitted. In this context, goodput is a layer-7 measurement of HTTP bytes received, as distinguished from “throughput,” which is defined in RFC 1242 as a layer-2 measurement.