TSO, GRO, RSS, and Blocklist Feature on Avi Vantage

Overview

This guide explains the TSO, GRO, RSS and blocklist features on Avi Vantage.

TCP Segmentation Offload (TSO)

TCP segmentation offload is used to reduce the CPU overhead of TCP/IP on fast networks. A host with TSO-enabled hardware sends TCP data to the NIC (network interface card) without segmenting the data in software. This type of offload relies on the NIC to segment the data and then add the TCP, IP, and data link layer headers to each segment.

When an Avi Service Engine (SE) is running in DPDK mode, TSO could be enabled on the following NICs:

  • ixgbe, vmxnet3, i40e
  • Mellanox connectX-4
  • Mellanox connectX-5 (introduced in 20.1.6 and 21.1.1 releases)
  • Broadcom BCM574XX (validated on BCM57414) family (starting with Avi Vantage 18.2.8)

TSO Support in Routing

With routing support enabled in SE, GRO (Generic Receive Offload) feature cannot be utilised because routing is stateless and SE will not be able to segment the large GRO coalesced packet, if the packets are not allowed to be IP fragmented. Hence with the support of this feature GRO can be utilised for the routed traffic where SE will be able to segment the larger packets into smaller TCP segments either by using the TSO if supported by the interface or the routing layer in SE.

During the three-way handshake both client and server advertise their respective MSS so that the peers will not send TCP segments larger than the MSS. This is enabled by default.

Generic Receive Offload (GRO)

Generic Receive Offload (GRO) is a software technique for increasing inbound throughput of high-bandwidth network connections by reducing CPU overhead. It works by aggregating multiple incoming packets from a single flow into a larger packet chain before they are passed higher up the networking stack, thus reducing the number of packets that have to be processed.

Note: The benefits of GRO are only seen if multiple packets for the same flow are received in a short span of time. If the incoming packets belong to different flows, then the benefits of having GRO enabled might not be seen.

The following are the interfaces on which GRO is supported by Avi Vantage in DPDK mode:

  • ixgbe, i40e, virtio, vmxnet3
  • Mellanox connectX-4
  • Broadcom BCM574XX (validated on BCM57414) family (starting with Avi Vantage 18.2.8)
  • Mellanox Technologies MT27800 Family [ConnectX-5] — 21.1.1 releases

Enabling GRO and TSO on an Avi SE

The TSO feature is enabled by default on an Avi SE.

Upgrading from a prior version will carry forward the GRO configuration for a SE group. However, if a SE group is newly created in 22.1.2, GRO config is in auto-mode. If the SE group has SEs with >=8 vcpus, GRO will be enabled.

Note: Enabling TSO/GRO is non-disruptive and it does not require an SE restart.

Login to the Avi CLI and use the configure serviceenginegroup command to enable TSO and GRO features.


[admin:cntrl]: > configure serviceenginegroup Default-Group

Updating an existing object. Currently, the object is:

| disable_gro                           | True    |

| disable_tso                           | True    |


[admin:cntrl]: serviceenginegroup> no disable_gro

Overwriting the previously entered value for disable_gro

[admin:cntrl]: serviceenginegroup> no disable_tso

Overwriting the previously entered value for disable_tso

[admin:cntrl]: serviceenginegroup> save

| disable_gro                           | False    |

| disable_tso                           | False    |

To verify if the features have been correctly turned ON in the SE, you can check the following statistics on the Avi Controller CLI.

GRO statistics are part of interface statistics. For GRO, check the statistics for gro_mbufs_coalesced.

TSO statistics are part of mbuf statistics. For TSO, check the statistics for the following parameters:

  • num_tso_bytes
  • num_tso_chains

Execute the show serviceengine <interface IP address> interface command and filter the output using grep command shown as follows:


[admin:cntrl]: > show serviceengine 10.1.1.1 interface  | grep gro

|       gro_mbufs_coalesced          | 1157967  |

|     gro_mbufs_coalesced            | 1157967  |

Note: The sample output mentioned above is for 1-queue (No RSS).

Refer to the output mentioned below for RSS-enabled, a 4-queue RSS.

Note: In case of a port-channel interface, provide the relevant physical interface name as the filter in the intfname option. For reference, refer to the output mentioned below for the Ethernet 4 interface.


show serviceengine 10.1.1.1 interface filter intfname eth4 | grep gro

|       gro_mbufs_coalesced          | 320370   |

|       gro_mbufs_coalesced          | 283307    |

|       gro_mbufs_coalesced          | 343143    |

|       gro_mbufs_coalesced          | 217442    |

|     gro_mbufs_coalesced            | 1164262   |

Note: The statistics for a NIC is the sum of the statistics for each queue for the specific interface.


[admin:cntrl]: > show serviceengine 10.1.1.1 mbufstats | grep tso

| num_tso_bytes                    | 4262518516                          |

| num_tso_chains                   | 959426                              |

If the features are enabled, the statistics in the output mentioned above will reflect non-zero values for TSO parameters.

Multi-queue Support

Dispatcher on Avi Vantage is responsible for fetching the incoming packets from a NIC, sending them to the appropriate core for proxy work and sending back the outgoing packets to the NIC. A 40G NIC or even a 10G NIC receiving traffic at a high packet per second or PPS (for instance, in case of small UDP packets) might not be efficiently processed by a single-core dispatcher. This problem can be solved by distributing traffic from a single physical NIC across multiple queues where each queue gets processed by a dispatcher on a different core. Receive Side Scaling (RSS) enables the use of multiple queues on a single physical NIC.

The rest of this section is structured as follows :

  • The RSS feature and how to enable it on Avi Vantage.
  • The configurable dispatchers allows you to configure the number of dispatchers, thus effectively setting the number of receive and transmit queues.

Enabling Multi-Queue Property for SE Image in OpenStack Cloud

You can configure multi-queue in OpenStack. You need to enable hw_vif_multiqueue_enable flag in OpenStack cloud configuration. For more details on configuring multi-queue in OenStack refer to OpenStack Cloud Advanced Configuration Options guide.

Receive Side Scaling (RSS)

When RSS is enabled on Avi Vantage, NICs make use of multiple queues in the receive path. The NIC pins flow to queues, and put packets belonging to the same flow to be used in the same queue. This helps the driver to spread packet processing across multiple CPUs thereby improving the efficiency. On an Avi SE, the multi-queue feature is also enabled on the transmit side, i.e., different flows are pinned to different queues (packets belonging to the same flow in the same queue) to distribute the packet processing among CPUs.

Note: RSS support is limited only to TCP traffic.

Avi Vantage supports multi-queue feature on the following Ethernet adapters :

  • Intel — 82599, X520, X540, X550, X552, XL710, X710
  • Mellanox — ConnectX-4
  • Broadcom BCM574XX (validated on BCM57414) family (starting with Avi Vantage 18.2.8)
  • VMXNET3 — Interfaces on the VMware cloud. VMXNET3 RSS support is limited only to TCP traffic

Note: The multi-queue feature (RSS) is not supported along with IPv6 addresses. If RSS is enabled, then IPv6 address can not be configured on any of the above supported interfaces. Similarly, if IPv6 address is already configured on any of the supported interfaces mentioned above, the multi-queue feature (RSS) can not enabled on those interfaces.

Enabling RSS on an Avi SE

The distribute_queues knob, in the SE-group properties enable and disable RSS. Login to the Avi CLI, and use distribute_queues command to enable the RSS feature.

Note: Any change in the distribute_queues parameters requires an SE restart.


| distribute_queues | False  |

[admin:cntrl]: serviceenginegroup> distribute_queues

Overwriting the previously entered value for distribute_queues

[admin:cntrl]: serviceenginegroup> save

| distribute_queues | True   |

When RSS is turned ON, all the NICs in the SE configure and use an optimum number of queue pairs as calculated by the SE. The calculation of this optimum number is described in the section on configurable dispatchers.

For instance, the output of a 4-queue RSS-supported interface will be as follows:


[admin:cntrl]: > show serviceengine 10.1.1.1 interface filter intfname bond1 | grep ifq

|     ifq_stats[1]                   |

|     ifq_stats[2]                   |

|     ifq_stats[3]                   |

|     ifq_stats[4]                   |

For a 4-queue RSS, the output would look like as shown above.

The value of counters for ipackets (input packets)and opackets (output packets) per interface queue will be a non-zero value as shown below:


[admin:cntrl]: > show serviceengine 10.1.1.1 interface filter intfname bond1 | grep pack

|     ipackets                       | 40424864                            |

|     opackets                       | 42002516                            |

|       ipackets                     | 10108559                            |

|       opackets                     | 11017612                            |

|       ipackets                     | 10191660                            |

|       opackets                     | 10503881                            |

|       ipackets                     | 9873611                             |

|       opackets                     | 10272103                            |

|       ipackets                     | 10251034                            |

|       opackets                     | 10208920                            |

Statistics of each queue and one combined statistics overall for the NIC.

Setting Dispatchers on Service Engines

With the configurable-dispatcher feature, you can configure the number of dispatchers that can be used in the Service Engine.

Note: Starting with Avi Vantage version 18.2.8, the distribute_queues parameter used to enable RSS mode of operation for multiple dispatchers has been deprecated. Refer to the section on Setting Multiple Queues per Dispatcher for relevant command in 18.2.8 and later.

The number of dispatcher cores that you can configure is limited to only powers of two with a maximum of 16 dispatcher cores. In other words, you can configure only the values from the set [0,1,2,4,8 or 16]. If the value is set to 0 (i.e., the default value), an optimum number of dispatcher cores is deduced automatically. The limitation to configure values of the form 2^n comes from some of the NICs that allow the number of RSS queues to be only powers of 2.

Refer to the mlx4 PMD Known Issues section of Mellanox DPDK Release Notes, for details on the number of configured RSS queues must be power of 2.

Use the num_dispatcher_cores command in SE-group properties to configure the number of dispatcher cores. By default, the num_dispatcher_cores is set to 0.
Login to the Avi CLI, and set the value of num_dispatcher_cores to the desired value. When RSS is enabled, this effectively sets the number of RSS queues, and hence the number of dispatchers is also set to the configured value.

Notes:

  • Any change in the num_dispatcher_cores parameter requires a restart of SE to get the configuration into effect.

  • The num_dispatcher_cores can be set to zero only for LSC in DPDK mode. In non-DPDK mode and for DPDK in other cloud setup, the value has to be set explicitly.

Configuration Samples

The example mentioned below exhibits the configuration on a bare-metal machine with 24 vCPUs, 2 10G NICs, and 1 bonded-if of 2 10G NICs, and distribute_queues enabled.

  • Set the value of the configure num_dispatcher_cores parameter is set to 8.

 [admin:cntrl]: serviceenginegroup> num_dispatcher_cores 8
 Overwriting the previously entered value for num_dispatcher_cores
 [admin-ctrlr]: serviceenginegroup> save 

  [admin:cntrl]:> show serviceengine 10.1.1.1 seagent | grep -E "dispatcher|queues"
  |num_dispatcher_cpu                   | 8
  |num_queues                           | 8 
  • Set the value of the configure num_dispatcher_cores parameter is set to 0 (the default value).
    After restarting the SE, though the configured value for dispatchers is set to 0, the number of queues, and hence the number of dispatchers is changed to 4 as shown below.

 [admin:cntrl]:> show serviceengine 10.1.1.1 seagent | grep -E "dispatcher|queues"
  |num_dispatcher_cpu                   | 4
  |num_queues                           | 4 

To further optimize the system performance, the Avi Controller’s configuration is overridden in the following two scenarios:

  • For a bare-metal machine with the number of vCPUs greater than 4, the dedicated dispatcher is turned ON automatically.
  • For a system with the sufficient number of cores and having only 10G interfaces, if the number of dispatcher cores configured is 0, the RSS is turned OFF even though you have turned it ON.

Note: A single dispatcher core is capable of processing an I/O of 10Gbps. This combined with other parameters like the total NIC capacity and the number of cores available is used for the automatic calculation of the optimum number of dispatchers.

Configuring Number of Queues per vNIC

Starting with Avi Vantage version 18.2.8, you can configure the number of queues per vNIC. This feature can be used in DPDK mode of operation on supported environments to enable the utilisation of more than one queue per dispatcher. This can be achieved by configuring the number of queues per vNIC greater than the number of dispatchers.

This feature is intended to provide network performance advantage in the environments with shallow interface ring sizes.

The following are the currently supported environments:

  • OpenStack

  • Amazon Web Services (AWS)

  • KVM

  • Baremetal (Linux Server Cloud)

You can configure the maximum number of queues per VNIC using max_queues_per_vnic parameter in SE-group properties.

The max_queues_per_vnic parameter supports the following values:

  • Zero (Reserved) — Auto (deduces optimal number of queues per dispatcher based on the NIC and operating environment)

  • One (Reserved) — One Queue per NIC (Default)

  • Integer Value — Power of 2; maximum limit is 16.

Note: You should set max_queues_per_vnic to 0 (auto) for non-DPDK mode of operation to utilise multiple dispatchers.

The max_queues_per_vnic parameter deprecates the distribute_queues parameter which was used to enable RSS mode of operation; wherein previously the number of queues were equal to the number of dispatchers.

Starting with Avi Vantage version 18.2.8, the migration routine ensures that the max_queues_per_vnic parameter is set to num_dispatcher_cores if the distribute_queues is enabled, else max_queues_per_vnic will be set to 1.

The following are the environment specific behaviour upon setting the max_queues_per_vnic value to 0 (auto):

  • OpenStack, AWS, KVM (DPDK mode) — The number of queues can be more than the dispatchers to utilise more than one queue per dispatcher.

  • Baremetal (DPDK mode) — The number of queues is same as the number of dispatchers. Utilises one queue per dispatcher.

  • Azure, AWS (Non-DPDK mode) — The number of queues is same as the number of dispatchers. Utilises one queue per dispatcher.

Note: You need to enable se_image_property hw_vif_multiqueue_enabled parameters in OpenStack to utilise max_queues_per_vnic. This ensures that the number of queues are equal to number of vCPUs.

You can use the following command to configure max_queues_per_vnic:


[admin:admin-controller-1]: serviceenginegroup> max_queues_per_vnic

INTEGER 0,1,2,4,8,16    Maximum number of queues per vnic Setting to '0' utilises all queues that are distributed across dispatcher cores.

[admin:admin-controller-1]: > configure serviceenginegroup Default-Group
Updating an existing object. Currently, the object is:
+-----------------------------------------+---------------------------------------------------------+
| Field                                   | Value                                                   |
+-----------------------------------------+---------------------------------------------------------+
[output truncated]
| se_rum_sampling_nav_percent             | 1                                                       |
| se_rum_sampling_res_percent             | 100                                                     |
| se_rum_sampling_nav_interval            | 1 sec                                                   |
| se_rum_sampling_res_interval            | 2 sec                                                   |
| se_kni_burst_factor                     | 2                                                       |
| max_queues_per_vnic                     | 1                                                       |
| core_shm_app_learning                   | False                                                   |
| core_shm_app_cache                      | False                                                   |
| pcap_tx_mode                            | PCAP_TX_AUTO                                            |
+-----------------------------------------+---------------------------------------------------------+
[admin:admin-controller-1]: serviceenginegroup> max_queues_per_vnic 2
Overwriting the previously entered value for max_queues_per_vnic
[admin:admin-controller-1]: serviceenginegroup> save

The show serviceegine [se] seagent displays the number of queues per dispatcher and total number of queues per interface.


show serviceengine [se] seagent

| num_dp_heartbeat_miss                | 0                                      |
| se_registration_count                | 2                                      |
| se_registration_fail_count           | 0                                      |
| num_dispatcher_cpu                   | 1                                      |
| ------------------ truncated output--------------------|
| num_flow_cpu                         | 1                                      |
| num_queues                           | 1                                      |
| num_queues_per_dispatcher            | 1               |

RSS Scale Out

  • Both Layer 2 and Layer 3 scale out are supported.
  • No asymmetric combinations are supported.
    • RSS enabled on one SE and disabled on the other SE is not supported.
    • In a pre-existing scale-out setup, any configuration changes which changes the RSS state in either of the SE’s is not supported. For instance, change of the RSS-supported interface or the distribute_queues parameter (as discussed in the previous section) is not supported.
TCP/UDP Virtual Service Profile Auto Gateway RSS scale out Notes
TCP Yes NA
TCP per packet Yes Inefficient for Layer 2 scale out since all packets coming to the secondary SE are handled by one dispatcher core. For efficiency, disable auto gateway in the virtual service configuration from the Avi user interface.
UDP fast path Yes/No Layer 2 scale out is not supported. All incoming packets are handled by the primary SE.

GCP RSS Support in DPDK Mode

By default SE is deployed in DPDK mode in GCP.

Starting with Avi Vantage version 20.1.3, DPDK RSS for GCP is supported.

To enable the RSS, you should configure max_queues_per_vnic and num_dispatcher_cores in service engine group properties.

The default value of max_queues_per_vnic is 1. Set this value to specify the number of queues per vnic. Setting this value to 0 to automatically determine the requisite number of queues in accordance with num_dispatcher_cores.

The default value of num_dispatcher_cores is 0. Set this value to configure the desired number of dispatchers.

For GCP instances, the queue depth is close to 2048 and one queue can be used by every dispatcher.

GCP supports the odd number of queue pairs, which is determined by the following formula:

min(max(floor(num_vcpus/ num_vnics), 1), 32)

For instance, n1-standard-16 instance with 3 vnics, 16 CPUs would support 5 queue-pairs on each interface.

The procedure to set as many dispatchers and as queues is to set max_queues_per_vnic to 0 and num_dispatcher_cores to <#cpus>.

You can configure num_dispatchers only as power of 2. To utilize all 5 available queues across 5 dispatchers, set num_dispatcher_cores to 8 and max_queues_per_vnic to 0. In this case available number of queues per vnic is 5 and configured number of dispatchers is 8, since num_dispatchers > num_queues, the num_dispatchers will get floored to num_queues, which is 5; thus allowing the utilisation of 5 dispatchers.

The following example shows the recommended max_queues_per_vnic and num_dispatcher_cores for each type of instance:

Type of Instance vCPU NICs max_queues_per_vnic num_dispatcher_core
n1-standard-4 4 2^n
3
0
0
number of nics
4
n1-standard-8 8 2^n
3
5,6,7
0
0
0
number of nics
4
8
n1-standard-16 16 2^n
3
5,6,7
0
0
0
number of nics
4
8

In GCP environment the packets of a particular flow is not distributed to a single queue based on the 5 Tuple when the multiqueue feature is enabled. In SE the flow table is maintained on a per interface per core basis. If the interface has multiqueue and each of the queue is owned by distinct cores, the flow table entries should be created on all the dispatcher cores. This leads to increase in number of flow table entries across the dispatcher cores, which would lead to increase in memory.

Blocklisting Feature

In Linux server cloud environment, if NICs have to be blocklisted (left unclaimed by SE/DPDK), specify the PCI BDFs (domain:bus:device.function) of the NICs in the /etc/se-bdf-ignore file in the host (outside the SE container). Update the file before logging in to the Avi SE. If you are already logged in to the SE, then restart the SE for the configuration to take effect.

To blocklist a NIC in a Linux server cloud, specify the PCI BDFs (domain:bus:device.function) of the NICs in the /etc/se-bdf-ignore file in the host (outside the SE container).Update the file before logging in to the Avi SE. If you are already logged in to the SE, then restart the SE for the configuration to take effect.

FAQ on Blocklisting

Q. How to find out the BDF of a NIC?

If the Ethernet 9 interface has to be blocklisted, specify the string that is against bus-info in the output of ethtool -i eth9 in /etc/se-bdf-ignore.

If /etc/se-bdf-ignore is not present, it has to be created. An example is shown below.


root@10.1.1.1:~# ethtool -i eth9
driver: vmxnet3
version: 1.4.7.0-k-NAPI
firmware-version:
bus-info: 0000:1c:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no
root@10-1-1-1:~# cat /etc/se-bdf-ignore
0000:1c:00.0
root@10-1-1-1:~#

Q. How to blocklist multiple NICs?

Specify BDFs separated by a comma with no spaces in between. An example is shown below.


root@10.1.1.1:~# ethtool -i eth8
driver: vmxnet3
version: 1.4.7.0-k-NAPI
firmware-version:
bus-info: 0000:1b:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no
root@10-1-1-1:~# ethtool -i eth9
driver: vmxnet3
version: 1.4.7.0-k-NAPI
firmware-version:
bus-info: 0000:1c:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no
root@10.1.1.1:~# cat /etc/se-bdf-ignore
0000:1c:00.0,0000:1b:00.0
root@10.1.1.1:~#

Q. Will the VLAN interfaces of a blocklisted NIC be claimed by the SE?

No, the VLAN interfaces associated with a blocklisted NIC remain unclaimed.

Q. What is the expected behavior when a blocklisted NIC is part of a port-channel?

Blocklisted NICs do not take part in the port-channels claimed by an SE.

Q. How many NICs per host could be blocklisted?

39 NICs could be blocklisted per host.

Document Revision History

Date Change Summary
December 23, 2020 Added 'GCP RSS Support in DPDK Mode' section