/Tips and Tricks Running Network Throughput Tests on OCI

Tips and Tricks Running Network Throughput Tests on OCI

Oracle Cloud Infrastructure (OCI) offers a broad catalog of compute shapes with different sizes and types. Each shape determines the number of oCPU, memory, and network bandwidth that will be allocated to an instance.

For the following tests, we will use the new E3 instances which are flexible shapes. That means that you can customize the number of oCPUs. The amount of memory, network bandwidth, and number of VNICs will scale proportionately with the number of OCPU.

Our goal is to measure the network throughput (bandwidth) and check the performance and quality of Oracle Cloud Infrastructure network between two Availability Domains (ADs). Following is the architecture used in the example:

What is the Expected Network Throughput?

We will use the

As you can see the network bandwidth is set to 1Gbps. Note that the allocated bandwidth is per compute instance, regardless of how many network interfaces are attached to the virtual machine. There are few things we need to consider:

  • Number of network interface: The bandwidth limit is cumulative of all outbound traffic from the compute instance.
  • Traffic destination: All destination counts towards the outbound limit.
  • Protocol: All outbound traffic over all protocols counts towards the rate-limiting.

For the sake of the test, we will use iperf3 and nuttcp. Iperf3 is an open-source tool that allows running reliable bandwidth tests. nuttcp is another network test tool, similar to iperf3, but it’s more accurate for UDP traffic.

Running TCP tests

Let’s start a simple test as shown below:

From the above output, you can see that we get a bandwidth accordingly to our shape (1Gbps). You should always consider the results obtained from the client machine. There is a small variation in the values between the server and the client.

If you want to run the tests the way around, where the server sends and the client receives, add the -R parameter

So far results are good. Let’s move into the next transport-layer protocol.

Running UDP Tests

We will now run tests using the UDP protocol.  You will need to invoke the option -u on the client-side.

Once the test is completed, it will show the following info:

  • ID, Interval, Transfer, Bandwidth (same as in TCP tests)
  • Jitter: defined as the difference in packet delay
  • Lost/Total Datagrams: indicates the number of lost datagrams over the total number sent to the server and its percentage.

We see that the total amount of transferred data is 1.25 Mbytes and the maximum achieved bandwidth is 1.05 Mbps over a time period of 10 seconds. The variance of the time delay between packets over the network (jitter) shows a value of 0.068 ms. There are no datagrams lost (0%) and the total amount of datagrams received by the server is 146. So far so good.

Let’s see what happens when we increase the size of the packets.

When large packet sizes are used, UDP drop rate increases from ~1% to over 50%. Why is this happening? There can be few things causing this problem. It can be a tuning problem, a host NIC problem, or a SmartNIC issue.

Let’s check with nuttcp. We will send 8000-byte length packets over a period of 10 minutes (600 seconds) at a speed of 2000Mbps.

We see a packet loss of 21.8%

Same behavior. We are running a Linux distribution and one of the most common causes of UDP datagram lost on Linux is an undersized receive buffer on the Linux socket.  Let’s check the socket’s buffer size:

212992 bytes is about enough for 26 UDP packets with 8000-byte length. If the application doesn’t pull from its receive buffer fast enough, or the NIC batches more than 26 packets before interrupting the kernel, the packets will be dropped.

How can we solve this?

There are a couple of things that can help.

1.) Increase socket buffer depth to accommodate NIC batching packets before interrupting OS. This is the preferred method. We will make an OS change to increase maximum socket buffer memory size:

2. Modify interrupt coalescing settings. Not preferred due to penalty hit servicing more interrupts.

Testing

Let’s change the buffer size and use iperf3 command to run with receiver creating a socket with a bigger buffer:

As you can see in the following output, packet loss drops down to less than 1%.

Conclusion

Linux distributions limit a socket’s buffer to a fairly small size by default so take it into account when running network throughput tests!

Hope that helps!

Links:

  • nuttcp: http://nuttcp.net/Welcome%20Page.html
  • iperf3: https://iperf.fr/iperf-download.php
  •  E3 Flexible shapes: https://docs.cloud.oracle.com/en-us/iaas/Content/Compute/References/computeshapes.htm#flex-shapes