Check Point R81 Firewall Security Gateway performance tests

In this article I will carry out empirical network throughput performance tests of the Checkpoint R81 Open Server Security Gateway with different configuration of Threat Prevention Software Blades realized in VMware ESXi Testlab. In this Testlab are up and running only the four hosts from prerequisites section on one dedicated Hardware with 20 CPU Cores. The purpose of this test is to show what throughput difference could the combination of active blades make, how can selected Checkpoint features affect performance and to provide guide for testing the performance with your configuration before installation in the production environment. This test does not represent any exact maximal throughput limitation by the Checkpoint Open Server Firewall System by reason of running the Testlab fully virtualized in VMware ESXi. I will be using iperf3 tool for active measurements of the maximum achievable bandwidth between the two test end systems.

 

Prerequisites:

 

1. Testlab Setup

Hostname:

CPU Cores:

RAM:

Hard Disk:

Network Adapter Type:

Network Interface1 IP:

Network Interface1 Sec. Zone:

Network Interface2 IP:

Network Interface1 Sec. Zone:

sg1

6

8 GB

60 GB

testcase dependent

10.0.1.2/24

Internal

10.0.2.2/24

Internal

ubuntu2

2

4 GB

50 GB

E1000e

10.0.1.14/24

centos2

2

4 GB

50 GB

E1000e

10.0.2.12/24

Policy is configured for test purposes to Accept all. Check Point Gateway sg1 is running with default configuration after install. 

2. Initial speed test without Firewall with VMXNET3

To obtain throughput speed estimation of vmware VMXNET3 driver in my VMware ESXi Testlab Server I will carry out the performance test without firewall with 50 simultaneous connections for 300 sec ~ 5 min to prove the maximal transfer speed between the linux systems using VMXNET3 network driver. The longer test window will smoother the variations in the measured values and the result will be more accurate. The network interface on the ubuntu2 is reconfigured only for this purpose (will be of course disabled after this iperf run) with IP Address 10.0.2.14 which resides in the same network as network interface on the centos2 (10.0.2.12).

[root@centos2 martin]# iperf3 -c 10.0.2.14 -t 300 -P 50 | grep SUM
[SUM]   0.00-300.00 sec  1.87 TBytes  54.8 Gbits/sec    0             sender
[SUM]   0.00-300.03 sec  1.87 TBytes  54.8 Gbits/sec                  receiver
[root@centos2 martin]#

This test has proven that the end systems (centos2 and ubuntu2) using network adapter driver VMXNET3 are capable of maximal link speed  approximately 54 Gbit/s in my Testlab.

3. Initial speed test without Firewall with E1000e

To empirically obtain maximal real throughput for the E1000e driver in my Testlab (which will be used for specific test scenarios) I will repeat the Test between two end systems ubuntu2 and centos2 with E1000e network adapter driver on both systems. The performance test will be carried out with 50 simultaneous connections for 300 sec ~ 5 min.

 

[root@centos2 martin]# iperf3 -c 10.0.2.14 -t 300 -P 50 | grep SUM
[SUM]   0.00-300.00 sec   590 GBytes  16.8 Gbits/sec  8628270         sender
[SUM]   0.00-300.09 sec   590 GBytes  16.8 Gbits/sec                  receiver
[root@centos2 martin]#

This test has proven that the end systems using network adapter driver E1000e in our Testlab are capable of maximal link speed of approximately 16.8 Gbit/s.

 

Hint:

Why are “Retr” – “8628270” in the results present?

In iperf3 the column Retr stands for Retransmitted TCP packets and indicates the number of TCP packets that had to be sent again (=retransmitted). A retry is a dropped TCP segment. Some buffer (possibly NIC ring buffer) is overwhelmed and packets got dropped. Not unexpected when the interface is saturated with bandwidth test. Interfaces can only send one packet at a time. When you try to send more packets that it can send one at a time (overwhelming the interface) it will fill buffers and when the buffers are full it will drop packets.

 
4. Test cases

I will perform the bandwidth performance test series with different number of simultaneous tcp connections (1,10,50,100) for each test run with duration 60 min between the end systems with following enabled blades:

 

a) FW

b) FW + Anti-Bot

c) FW + Anti-Virus

d) FW + Anti-Bot + Anti-Virus

e) FW + IPS

f) FW + IPS + Anti-Bot

g) FW + IPS + Anti-Virus

h) FW + Anti-Bot + Anti-Virus

i) FW + IPS + Anti-bot + Anti-virus

 

Those Blades are running with default configuration out of the box. For each simultaneous connection parameter (sim conn=[1,10,50,100]) I will run five separate connection measurements (seq=[1,2,3,4,5], one sequence=60min) to increase the accuracy of the measurement by reducing the possibility of error.

 

This test series will be repeated for three times with changed variables:

Testcase

CoreXL Instances

sg1 Network adapter type

multiqueue

X

6

VMXNET3

enabled

Y

4

VMXNET3

enabled

Z

6

E1000e

disabled

We have 6 CPU Cores in every use case and [6, 4] CoreXL instances configured for different Testcases [X, Y, Z].

 

CoreXL is a performance-enhancing technology for Security Gateways on multi-core processing platforms. CoreXL enhances Security Gateway performance by enabling the CPU processing cores to concurrently perform multiple tasks. (more about CoreXL)

  

Assigned Affinities:

Testcase “X”

Testcase “Y”

Testcase “Z”

sg1> fw ctl affinity -l -r
CPU 0: fw_5
CPU 1: fw_3
CPU 2: fw_1
CPU 3: fw_4
CPU 4: fw_2
CPU 5: fw_0
All: mpdaemon fwd in.asessiond cprid cpd
Interface eth0: has multi queue enabled
Interface eth1: has multi queue enabled
Interface eth2: has multi queue enabled
sg1>
sg1> fw ctl affinity -l -r
CPU 0:
CPU 1:  fw_3
        mpdaemon fwd cprid cpd
CPU 2:  fw_1
        mpdaemon fwd cprid cpd
CPU 3:
CPU 4:  fw_2
        mpdaemon fwd cprid cpd
CPU 5:  fw_0
        mpdaemon fwd cprid cpd
All:
Interface eth0: has multi queue enabled
Interface eth1: has multi queue enabled
Interface eth2: has multi queue enabled
sg1>
sg1> fw ctl affinity -l -r
CPU 0:  eth0
        fw_5
CPU 1:  eth1
        fw_3
CPU 2:  fw_1
CPU 3:  eth2
        fw_4
CPU 4:  fw_2
CPU 5:  fw_0
All:    mpdaemon fwd cprid cpd
sg1>

Learn more about CPU allocation and performance tuning:

Check Point – Allocation of Processing CPU Cores

Check Point – Performance Tuning

5. Test results
5.1 Testcase “X”
5.2 Testcase “Y”
5.3 Testcase “Z”
6. Results Interpretation
  1. First big throughput difference between the results in graphs X,Y and graph Z we can see, is when only Firewall Blade is active. This is caused by enabled Multi-Queue feature in Testcases X and Y which were designed to improve network performance by letting you configure more than one traffic queue for each network interface. By default, each network interface has one traffic queue handled by one CPU. (more about Multi-Queue).
  2. the testcase Z results are lower in comparison to the testcases X,Y. Testcases Z is using E1000e Adapter Type which has not as much throughput performance as the VMXNET3.
  3. Generally we can see the difference with Multi-Queue enabled in Testcases X,Y and in Testcase Z. The more simultaneous connections we have the better performance advantage using Multi-Queue with every Combination of the Active Blades.
  4. Throughput Performance difference between X and Y is caused by the different CoreXL configuration. In testcase X are 6 CoreXL instances present and in testcase Y are 4 CoreXL instances present. Therefore testcase X has slightly better results in comparison to testcase Y.
  5. Only Firewall Blade active – has always the best throughput performance, because Firewall blade (and VPN) is using Accelerated Path (SecureXL). (more about SecureXL)
  6. Anti-Virus and Anti-Bot Blades are using Medium Path (PSLXL, CPASXL). (more about  Security Gateway Packet Flow and Acceleration)
  7. Considering we are using IPS Profile out-of-the-box, IPS Blade is using in our case Slow Path. Therefore there is biggest throughput difference after enabling IPS Blade.
7. Conclusion

In this throughput performance test we have demonstrated the performance impact by different Check Point features as well as active blades on the same Checkpoint R81 Security Gateway system. We are now familiar with technologies such as SecureXL, CoreXL, Multi-Queue. 

 

CoreXL is a performance-enhancing technology for Security Gateways on multi-core processing platforms. It enhances Security Gateway performance by enabling the CPU processing cores to concurrently perform multiple tasks. 

 

SecureXL feature accelerates traffic that passes through a Security Gateway. When SecureXL is enabled, all packets should be accelerated, except packets that match the special conditions

 

Multi Queue is an acceleration feature that lets you assign more than one packet queue and CPU to an interface. 

 

All those techniques are improving performance of the Security Gateway. Performance of the Security Gateway depends on CPU, Memory, Network Interfaces, Storage device/speed. Throughput also depends on many other variables such as packet rate in combination with packet size. There is on every Security Gateway potential for performance optimization, because the standard configuration is cut to size for the majority of customers. I always recommend my customers in first step before performance optimization activities, to take a deeper look into the structure of traffic passing through the Firewalls and at next stage identifying the bottlenecks and choosing a proper solution for performance optimization. (more about Performance Tuning)