Network World Clear Choice Test: IPS Performance
Published in Network World, September 11, 2006
Test Methodology
Version 2.01: 2006091101.
Copyright 2005-2006 by Network Test Inc. and Opus One. Vendors may comment on
this document and any other aspect of test methodology. Network Test and Opus
One reserve the right to change test parameters at any time.
This document describes
benchmarking procedures for intrusion prevention systems (IPSs). Test results
are scheduled for publication in Network
World.
These tests assess IPSs
along three dimensions:
Performance
Performance describes an
IPSÕs ability to inspect, analyze, alert, and filter traffic at a given rate
and/or with a given response time. High-end IPSs currently claim to inspect
traffic at multiple gigabits per second; accordingly, a key goal of this test
is to determine maximum forwarding rates for benign traffic while concurrently
blocking malicious traffic. We also plan to measure latency and/or response
times.
Correctness
IPS ÒcorrectnessÓ describes
the ability to differentiate between benign and malicious traffic.
Correctness testing will
assess the ability of an IPS to block a large number of attacks, some
well-known and others less so, offered in the midst of a high-speed data
stream. We describe attacks in broad terms in the ÒAttack TypesÓ section below,
but we do not enumerate the specific attacks we plan to use.
Completeness
IPS
"completeness" assesses how completely the IPS fulfills its job. We will test this in two ways. Objectively, we will evaluate whether
the IPS blocks the attacks that we send it. Subjectively, we will assess the quality of information
given a network ad
ministrator about attempted
attacks. An IPS not only should block recognized attacks, but also provide
network administrators with useful information about the attack.
Completeness tests,
conducted in conjunction with correctness tests, assess:
--the ability of the IPS to
continue alerting while under attack
--the quality of attack
descriptions for a given attack (such as the type of exploit or flood and its
origin(s) and target node(s)
--the quality of other
information provided about the attack, such as recommended workarounds,
patches, or preventive actions
Signature-based vs.
anomaly detection IPSs
This methodology takes a
black-box approach to IPS design, relying only on externally observable
phenomena such as measurements of traffic rates and malicious traffic
traversal. As such, we do not distinguish between IPSs based on signature-based
or anomaly detection designs. For anomaly detection systems, we will stage
equipment for a reasonable period by offering ÒnormalÓ traffic (which may
include some percentage of malicious traffic) prior to beginning our tests.
These tests are not appropriate
for IPS devices that only do rate-based intrusion prevention. However, some of our tests will
evaluate an IPSÕs ability to block attacks based on volume, such as SYN floods.
Vendor on-site
participation
We generally discourage
on-site visits from vendorsÕ engineers during these tests. A visit for initial
device configuration is acceptable, but only if it is standard practice for all
customers. As noted, we are not disclosing the majority of attacks to be used,
especially for the completeness and correctness tests. Vendors are not
permitted on-site while we conduct the completeness and correctness tests.
This document is organized
as follows. This section introduces the tests to be conducted. Section 2
describes the test bed. Section 3 describes the tests to be performed. Section
4 provides a change log.
Vendors should submit
devices that have at least three gigabit Ethernet interfaces (two for
monitoring traffic and one for management). Performance numbers will be
reported with pricing, so there is an advantage to submitting a product suited
to this configuration.
We also can test products
with more interfaces. Vendors who wish to submit devices with more interfaces
should contact us to discuss this.
Devices must have the
ability to transparently bridge (not route) traffic between monitoring
interfaces.
There are more details about
device configuration later in this section.
Also, as described
separately in the test
invitation, vendors MUST supply the official name of the product, the
version tested, the price as tested, and if there are any additional hardware
or software, the nomenclature and
pricing for these options. Vendors MUST supply this additional information
before testing begins.
The primary attack
generation tool for this project is the ThreatEx appliance from Imperfect Networks. We also use the
Avalanche and Reflector traffic generators from Spirent Communications to emulate HTTP and
HTTPS clients and servers.
We may also augment these
tools with a collection of homegrown attack and vulnerability assessment
packages.
More information about
ThreatEx is available here:
http://www.imperfectnetworks.com/?page=solutions
More information about
Avalanche and Reflector is available here:
http://www.spirentcom.com/analysis/product_line.cfm?pl=32&wt=2
The test bed models the
complex mix of device types found in enterprise networks. While this test focuses
on network-based intrusion prevention systems, IPS vendors should configure
their devices to protect mixture of network and host operating systems,
including those from Microsoft, Red Hat, Sun Microsystems, Apple Computer,
FreeBSD, Cisco Systems, Juniper Networks, and Check Point Software
Technologies.
The logical test bed
consists of a single untagged VLAN with the device under test (DUT) operating
in bridging mode. (Note: IPSes
that cannot operate in bridge mode will not be tested.) A table at the end of this section lists the IP
subnets on the monitored network.
On one side of the DUT are
clients emulated by the ThreatEx and Avalanche appliances, while on the other
side the Reflector and ThreatEx appliances emulate servers. A single Ethernet
switch with copper gigabit Ethernet ports ties together all devices, including
the DUT. The switch is capable of switching fully meshed traffic patterns at
line rate.
The Avalanche/2500 is
capable of simulating millions of concurrent web clients. However, not all DUTs
will be able to handle the same amount of TCP connection concurrency. Thus, the
volume of simulated of traffic should be relative to the maximum amount of
concurrent transactions a given IPS can analyze without dropping frames or
missing a security breach.
While the Avalanche/2500
streams traffic to the simulated web servers on the Reflector/2500, the
ThreatEx/2500 will inject malicious traffic into the network in a stepped
fashion in order to gage the effects of various traffic ratios
(malicious:benign). The idea is to be able to identify how effective the DUT is
at identifying attacks in the presence of legitimate traffic.
Given that the DUT operates
in bridging mode on this network, no IP address assignment for IPS monitoring
interfaces should be necessary; a separate management interface should be
provided, with addressing instructions given in the next section. Vendors
should not enable spanning tree
on the monitoring ports, as both should be active.
The following table lists
the IP subnets allocated on the monitored networks. The section on ÒIP access
controlÓ below covers the same information and gives access control rules for
these subnets. Note that we do not enumerate specific host addresses; vendors may
safely assume we will use any and all addresses available within each netblock.
IPv4 netblock |
Service(s) |
172.20.0.0/24 |
POP3, IMAP, IMAPS, SMTP |
172.20.1.0/24 |
HTTP/HTTPS IIS servers |
172.20.2.0/24 |
HTTP/HTTPS Apache servers |
172.20.3.0/24 |
Windows Media streaming servers |
172.20.4.0/25 |
FTP servers |
172.20.4.128/26 |
SIP gateways |
172.20.4.192/26 |
DNS servers |
Vendors should configure one
interface for IPS management using these parameters:
IP address: 172.16.1.X
(send email to dnewman@networktest.com for the value
of X)
Subnet mask: 255.255.255.0
Default gateway:
172.16.1.254
Please disable support for
IP routing, IP multicast, spanning tree, and any other extraneous traffic on
the management interface.
While no IP addresses should
be necessary on the (bridged) monitoring interfaces of the IPS, the device
nonetheless needs to be told which networks and hosts it monitors. This section
provides IP address access control information.
The ÒoutsideÓ side of the
IPS should be configured to be installed directly inside of a firewall facing
the public Internet. The ÒinsideÓ
side of the IPS should be configured to protect the inside of the network as
defined by the firewall rulebase.
The firewall rulebase is:
Permit
any-outside 172.20.0.0/24 POP3, IMAP, IMAPS, SMTP
(TCP ports 25,
110, 143, 993)
Permit
any-outside 172.20.1.0/24 HTTP/HTTPS IIS servers (TCP ports 80, 443)
Permit
any-outside 172.20.2.0/24 HTTP/HTTPS Apache servers (TCP ports 80, 443)
Permit
any-outside 172.20.3.0/24 Windows Media streaming servers (UDP port 1755)
Permit
any-outside 172.20.4.0/25 FTP servers (TCP ports 20, 21)
Permit
any-outside 172.20.4.128/26 SIP gateways (UDP port 5060)
Permit
any-outside 172.20.4.192/26 DNS servers (TCP and UDP port 53)
Permit
any-inside any-outside ANY service
Deny any any
any
This section describes in
general terms the type of monitoring (be it signature-based or anomaly-based)
that an IPS should perform.
Vendors should assume that
servers are unpatched and thus vulnerable to well-known host OS exploits.
Further, IPSs SHOULD block packets that would result in any of the following:
- system denial of service
- system compromise (root or other access)
- elevated access (i.e., directory traversal that gives
access to unintended files)
-
IPSs SHOULD NOT block
spurious attempted attacks. Examples include recon attacks (port scans,
traceroute, ping) and attack attempts with no known security history.
We will launch just three
exploits against each DUT for performance testing:
- SQL slammer (CVE 2002-0649)
- Witty worm (CVE 2004-0362)
- Cisco malformed SNMP (CVE 2004-0714)
We may also offer other
well-known stateful attacks such as Code Red v2, but these are merely to
populate layer-2 address tables of our switches, and not part of formal
testing.
Note that we will NOT change
DUT configuration between the performance and completeness/correctness portions
of this test. Thus, DUTs must be configured to block both the exploits listed
above, and those in the next section.
On the theory that
Òattackers donÕt make appointments,Ó we do not list most attacks we will launch
against IPSs for the completeness/correctness portion of tests. Some will be
well-known exploits, such as those described in the Mitre CVE database. Others will be
variations on these and may possibly include zero-day exploits.
Network Test has divided
attacks into two categories: the Ò80 percentÓ and Ò20 percentÓ groups,
described below.
Approximately 80 percent of
the attacks we launch are well known. We would expect all IPSs to identify and
stop these attacks. Attacks in this group include:
1. Attacks that exploit
vulnerabilities in a well-known service such as HTTP, SMTP, IMAP, or POP3.
2. Virii, worms, or DoS
attacks created at least two weeks prior to the timestamp of the IPSÕs attack
recognition library
3. Attacks that result in
root compromise of hosts or network elements protected by the IPS.
Approximately 20 percent of
our attacks are not widely known, and thus might not be detected or blocked by
the IPS. These include:
1. Attacks that compromise, circumvent, or interrupt the service of the IPS.
2. Attacks that traverse to
a ÒprotectedÓ network by
obfuscating packet contents (for example by using fragmentation or variations
in exploit payload)
3. Attacks that exploit vulnerabilities
in security applications, such as sniffers and traffic analyzers
4. Attacks that exploit vulnerabilities in outdated or incorrectly implemented
protocol stacks (such as malformed IP packets, spoofed ARP requests or
responses, or PUSH/ACK floods)
5. Any monitoring that attempts to go undetected.
To determine the maximum
forwarding rate (RFC 2285, 2889) at which an IPS can inspect, analyze, alert,
and filter stateful traffic
To determine application
response time while forwarding stateful traffic at the maximum forwarding rate
To determine throughput (RFC
1242, 2544) at which an IPS can inspect, analyze, alert, and filter stateless
traffic
To determine the latency
while forwarding stateless traffic at the throughput rate
The DUT must be configured
to bridge traffic between interfaces. The DUT must be configured to perform
real-time alerts. The DUT should be configured to disable all extraneous
management traffic, such as spanning tree messages, OSPF HELLO messages, IGMP
messages, or any other traffic that might degrade the DUTÕs forwarding
rate. The DUT may NOT be
configured to optimize performance at the expense of Òbest practicesÓ logging
and auditing, for example by disabling logging.
For stateful tests,
Avalanche 2500 appliances are configured to request HTTP, FTP, SMTP, POP3, and
DNS traffic. Some 1500 simulated clients on Avalanche request 11-kbyte objects
over HTTP, 5-Mbyte objects over FTP, and 50-kbyte objects over POP3 and SMTP
from 16 virtual IIS servers running on Reflector 2500. Two pairs of Avalanche
and Reflector appliances can achieve sustained goodput[1]
of 3.8 Gbit/s. This is nearly double the capacity of IPS devices with two
monitoring interfaces, as described in Section 2.
For stateless tests, a
Spirent SmartBits running TRT Interactive will offer UDP traffic in a
Òport-pairÓ topology between each pair of DUT interfaces. Frames are 64, 512,
and 1518 bytes long, offered in separate tests. The traffic uses 254 source IP
addresses, all targeting a single destination host. The UDP traffic uses source
port 1025 and destination port 1026 in all cases. The payload of each packet is
random. The SmartBits runs at line rate with measured latency of 100
nanoseconds or less.
As noted, we also can test
devices with more monitoring interfaces; PLEASE ADVISE ASAP IF YOUR DEVICE
ACHIEVES GOODPUT ABOVE 3.8 GBIT/S.
1. For stateful traffic
Avalanche (emulated clients) and Reflector (emulated servers) appliances begin
with a baseline consisting solely of benign traffic – a mixture of HTTP,
FTP, POP3, SMTP, and DNS. We measure goodput during a steady-state period of at
least 60 seconds. This test is run on a single port-pair.
2. Once the baseline test
has been run, we calculate the DUTÕs maximum forwarding rate by adding together
the average incoming and outgoing packet-per-second rates during the
steady-state phase of testing.
3. We repeat the same test
as in the first step, but this time offer attack traffic (as described in
section 2.7) at 1 percent of the aggregate pps rate described in step 2. We
clear all attack alerts on the DUT between iterations, if the DUT allows this.
4. We repeat the test twice
more with attack traffic offered at 4 percent and 16 percent of the aggregate
pps rate described in step 2.
5. We record alerts and
timestamps for all attacks seen by the DUT. We also determine whether the DUT forwarded attack(s), and
if so which ones.
6. We repeat steps 1-5 for
DUTs supporting more than one port pair (for example, two-pair and four-pair
testing).
7. For stateless testing, we
configure TRT on SmartBits to determine the throughput level for 64-, 512-, and
1,518-byte frames. TRT also records average latency at the throughput level.
Test duration is 300 seconds. We conduct this test on a single port pair.
7. For 512-byte frames, we
calculate the DUTÕs aggregate forwarding rate by adding together the incoming
and outgoing packet-per-second rate at the throughput level.
8. We offer 512-byte UDP
frames from SmartBits at the throughput rate minus 1 percent, as determined in the previous two steps.
In place of the ÒmissingÓ 1 percent, we offer attack traffic from ThreatEx at 1
percent of the aggregate forwarding rate described in step 7.
9. We repeat the previous
steps with benign traffic rates reduced by 4 percent and 16 percent,
respectively. We offer attack traffic at these rates.
10. For each iteration, we
record aggregate forwarding rate, latency, and whether the DUT forwarded any
attack traffic.
11. We repeat steps 7-10 for
DUTs supporting more than one port pair (for example, two-pair and four-pair
testing).
1. Maximum goodput rate
2. Page response time
3. Throughput (64, 512-,
1518-byte frames) of benign UDP traffic
4. Latency (64, 512-, 1518-byte
frames) of benign UDP traffic
5. Forwarding rate (512-byte
frames) of UDP traffic + attack traffic at 1, 4, 16 percent of throughput rate
6. Latency (512-byte frames)
of UDP traffic + attack traffic at 1, 4, 16 percent of throughput rate
Note: In addition to
correlating DUT alerts with the attack quantities configured on the ThreatEx
appliance, we also analyze packet captures to determine whether malicious
traffic was not only recognized but also prevented from traversing the DUT.
Correctness: To assess an IPSÕs ability to identify and block
attacks using various proportions of benign and malicious traffic. Not every IPS is expected to block
every attack. However, for any
attack that the IPS does block, we expect it to block every instance of that
attack that it sees. This test
will validate whether the IPS can be relied on to block all supported attacks
at all traffic levels.
Completeness: To assess the ability of the IPS to block attacks
from a wide-ranging database of signatures. To assess the quality of reporting provided by the IPS while
under attack. To assess an IPSÕs
ability to provide timely and useful reporting on the malicious traffic it
sees.
DUT configuration is the
same as the ÒPerformanceÓ test and may not be changed for this test.
The Òbackground trafficÓ
generated by the Avalanche/Reflector appliances is also the same as the
ÒPerformanceÓ test.
HTTP traffic is sustained
for at least 60 seconds before DUT streams attacks at 10 percent of maximum
goodput, as determined from the performance test. We repeat these correctness
tests with various combinations of benign and attack traffic -- 80:20, 70:30,
60:40, and 50:50, respectively.
1. Avalanche and Reflector
offer HTTP traffic at 80 percent of the maximum goodput rate (as determined in
the performance test) for a steady-state period of at least 60 seconds.
2. Once simulated HTTP
traffic reaches 80 percent of the maximum goodput rate, the ThreatEx
applicances offer a mix of various attacks at 20 percent of the maximum goodput
rate for at least 60 seconds.
All attacks belong to the
Ò80 percentÓ group, as described in section 1 of this document.
4. We record alerts and
timestamps for all attacks seen by the DUT. We also determine whether the DUT halts transmission of
attack(s), and if so which ones.
5. Avalanche and Reflector
continue to offer HTTP traffic at 80 percent of the maximum goodput rate for at
least 28 seconds after ThreatEx stops transmitting attack traffic.
6. We repeat all preceding
steps with different ratios of benign and attack traffic – 70:30, 60:40,
and 50:50, respectively.
7. We repeat all preceding
steps using attacks from the Ò20 percentÓ group.
1. Correctness: For each
traffic ratio, for each attack in the Ò80 percentÓ and Ò20 percentÓ group that
is blocked at least once at any traffic level, the percent of the time that the
attack is blocked and whether this varies based on traffic ratio. An attack is
defined as ÒblockedÓ when the IPS deletes the datagram that would cause the
attack to succeed from the data stream passed through the DUT.
2. Correctness: For each
traffic ratio, variance in latency and packet loss of non-attack traffic
compared with the baseline performance test.
3. Completeness: For the
80:20 traffic ratio, for each attack in the Ò80 percentÓ and Ò20 percentÓ
group, whether the IPS blocks the attack at least once. [completeness]
4. For each traffic ratio,
when an attack is actually blocked, whether the IPS provides an alert (or
aggregated alert) indicating that an attack was blocked.
1. For the alerts provided
by the IPS, how well the information available in the management system would
enable a trained security staff member to perform forensics analysis on the
blocked attacks.
2. Ease of system
configuration, such as how long it takes to a given security policy (see the
firewall rule base and specific signatures given in Section 2)
Version 2.01
11 September 2006
Changed title to indicate publication in Network World
Version 2.0
30 May 2006
Section 2.7: Added
description of exploits used in performance tests
Section 3.1: Added
description of stateful and stateless test procedure
Version 1.1
14 February 2006
Section 2.4.1: Listed
monitored netblocks
Section 2.4.2: Changed
default gateway from 172.16.1.1 to 172.16.1.254
Version 1.0
2 February 2006
Initial public release
Version 0.5
2 February 2006
Internal release
Version 0.4
30 January 2006
Internal release
Version 0.3
25 January 2006
Internal release
Version 0.2
6 December 2005
Internal release
Version 0.1
7 November 2005
Initial internal release
[1] RFC 2647 defines goodput as the number of bits per unit of time forwarded to the correct destination interface of the DUT, minus any bits lost or retransmitted. In this context, goodput is a layer-7 measurement of HTTP bytes received, as distinguished from Òthroughput,Ó which is defined in RFC 1242 as a layer-2 measurement.