In the previous blog, we talked about the new AMD EPYC shapes and the freedom to configure your instances without having a fixed instance catalog.
The new E3 shapes that we announced bring new levels of flexibility not seen before. This is a huge step in the Cloud space but has an impact on our story around OCI networking. What does it mean to have flexible instances in the infrastructure? There are several factors you need to take into account but I will mainly talk about two of them, Capacity Planning and Network considerations.
When we look at Capacity planning, our engineers measure the systems that are in place, they look at the different components and their performance, and they establish usage patterns that will help us predict demand.
Processors, memory, storage, and network capacity are the main components but not the only ones. We need to understand the type of workload and the impact (think when there is a scale up or scale out process in place), and how you can optimize and improve existing resources.
This is an iterative process, which at a high-level overview it looks as follows:
What we mainly do is capture multiple models, including the worst-case scenario, and therefore the riskiest. Each model will have its pros and cons and will help us make the decisions for our next improvement.
When you run this process having a fixed catalog (e.g. an oCPU is mapped to 16GB RAM and 1Gbps of network bandwidth) and you scale in the traditional way (in powers of two: 2,4,8,16,32,64..) it is much easier to predict capacity. With this new model, where you have true flexibility it turns into a more complex process.
Oracle Cloud was the first major cloud provider implementing “off-box” virtualization in its cloud regions (it’s a design principle). We took the network and IO virtualization out of the server stack and placed it directly in the network. By doing so, customers can provision dedicated hosts with no hypervisor overhead, where there is no compute shared resources, neither noisy neighbors. All this within a full software-defined Layer 3 topology.
Let’s take a look at how OCI networking works. You might be wondering what an “off-box” virtualization is. I define it as a little supercomputer embedded in the network. It’s an intelligent card where you can offload services like virtual switching, software-defined storage, and data/network encryption. Having this in place will free up valuable CPU cores from the host and provide higher performance.
As you can see all the network and I/O virtualization is done within the SmartNIC. These SmartNIC will connect on one port to the host physical NIC and on the other hand to the Top of the Rack (ToR) switches.
We have gone over the description of a single rack, however, at large scale datacenters, you will have thousand of racks per pods, and thousands of pods per Datacenter. That makes the network an interesting place to live.
When we look at the Datacenter implementation we find different areas where we can have bottlenecks. At a high level, you might find bandwidth constraints in a physical port, a switch itself or the network:
The network oversubscription is defined as the point where ingress bandwidth > egress bandwidth. If we translate this into the fabric world, it is the difference between the amount of bandwidth offered by the ToR switches and the amount of bandwidth available from the ToR into the network fabric.
Let’s pick an example. We will assume that we have 18 servers per rack (2RU each server), with 2x 50Gbps links between each server and the ToR.
- 18 Servers x 2 NICs = 36 NICs, each with 50Gbps = 1.8 Tbps traffic arriving to the ToR.
- ToR uplinks: N x 100Gbps . We will assume 8 x 100Gbps uplinks
- Oversubscription ratio: 1800 Gbps / 800 Gbps = 2.25:1
We get approx. a 2:1 oversubscription on the substrate. Looking to the previous SmartNIC that we had (2x25Gbps interfaces), the ratio was approx. 1:1 (1.125:1). That’s like having a dedicated highway for your traffic.
Let’s dive in the E3 topic. With the release of the E3 AMD shapes, how much bandwidth do we have per oCPU? Following are the specs for the AMD architecture:
For each 64-core processor, we have 50Gbps of network bandwidth. In an ideal scenario where we don’t have oversubscription, we divide the bandwidth between the oCPUs resulting in 781,25Mbps per core. However, Oracle is offering 1Gbps of network bandwidth per oCPU. That means that if we assign 1Gbps per CPU (up to 64) and divide it by the total BW (50Gbps) we get 28% of oversubscription. This is one of the trade-offs for using the new E3 shapes vs the Intel and E2 shapes.
Wait…there’s something weird here. The maximum bandwidth allocated is 40Gbps!!!
Yes, that’s right. You need to consider that VMs can burst traffic and that there is a hypervisor that requires also network bandwidth (usually not very high, but you still need bandwidth). To ensure we offer the best experience network bandwidth is rate limited at 40Gbps.
How does it compare with previous AMD shapes (E2 instances)? Maths will be the same as our previous example (781,25Mbps per core), however, when you look at Oracle’s offering, it provides 700 Mbps network bandwidth per core, so you will not be oversubscribing.
As you can see there are some network considerations you have to consider with the new model. If you want to check some of the performance tests comparing AMD EPYC against existing x86 standard alternatives, please visit Oracle’s official blog
The last thing I would like to highlight is that Oracle offers SLAs for network performance, so you are covered.
With flexible shapes, you have to balance between performance and flexibility. It is not an easy decision, but I think it’s the right one if you want to give customers freedom when configuring their environments. On the other hand, if you want to avoid network oversubscription, you have a wide variety of Intel and E2 shapes available. For me, it’s all about giving options to customers.
You can have some questions around the future roadmap of the E3 shapes. Will you have the capacity to burst network traffic to a larger number of Gbps per oCPU? And if so, will it be something that client can configure, or will it just be a default setting (static boundary)?
We will see how the evolution of the service is, but there is no doubt that Oracle is evolving their infrastructure to a real elastic platform.