Hell Freezes Over: Cisco And Nvidia Cross-Pollenate AI Networking

UPDATED  Networking giant Cisco Systems and AI platform provider Nvidia have hammered out a deal to mix and match each other’s technologies to create a broader set of AI networking options for their respective and – importantly, prospective – customers.

Nvidia has been a competitor of Cisco’s in the datacenter since acquiring Mellanox Technologies in March 2019 for $6.9 billion, but they have generally kept to their patches. Cisco is the enterprise and service provider giant that dabbled a bit in AI and HPC as it relates mostly to financial services firms, while Nvidia had expertise in HPC and among certain hyperscalers and cloud builders trying to build low latency networks, often with InfiniBand but increasingly with Ethernet that has been gussied up with RDMA memory technologies and other techniques borrowed from InfiniBand.

But the minute Cisco entered the merchant silicon market with its Silicon One ASICs, which sport switch and router functions that have a common underlying architecture and which in recent months have had packet spraying and other congestion control techniques added to them to herd the massive “elephant flows” on the backend networks used in AI training clusters, a collision between Nvidia and Cisco was imminent. But instead of bumper cars, what the market will now have is a covalent bond, a sharing of technologies and more choices.

Where there were once three options – the Cumulus Linux or SONiC network operating systems on Spectrum 4 switch ASICs paired with BlueField 3 DPUs from Nvidia or the NX-OS network operating system on Silicon One switch ASICs from Cisco – now there are many options, perhaps more than a half of dozen depending on how you count it and what the two companies do under their new AI networking partnership.

Under that partnership announced today, a bunch of things are happening. First, Cisco is porting its NX-OS networking operating system to run on Nvidia’s Spectrum 4 ASICs, which are at the heart of the Spectrum-X networking platform that includes BlueField 3 DPUs for all kinds of congestion control and adaptive routing that is offloaded from servers and switches to these auxiliary processors.

These congestion control and adaptive routing features are part of the InfiniBand switch ASICs and MLNX-OS network operating system that Mellanox created for InfiniBand decades ago. We strongly suspect that in the wake of Nvidia buying Cumulus Networks in May 2020, all kinds of stuff from MLNX-OS (and possibly the Onyx NOS that goes end of life this year) was moved over to the Cumulus Linux NOS and has also been ported to the BlueField 3 DPUs to make up the Spectrum-X software stack. (It is hard to prove, but logical.) Suffice it to say, Nvidia has lots and lots of NOS expertise in its own right.

Of course, so does Cisco with its venerable Internetwork Operating System (IOS), which predates even commercial-grade Unixes and is based on its own kernel. The more modern NX-OS for the Nexus line of switches that date from the late 2000s is based on a Linux kernel, as is the Extensible Operating System (EOS) from Arista Networks.

Under the new partnership between Cisco and Nvidia, companies will be able to buy Cisco Nexus switches based on the Spectrum 4 switch ASICs and run the familiar NX-OS on them and still get the benefits that come with the Spectrum-X stack because really a lot of the smart stuff for shaping traffic and controlling security for Spectrum-X happens at the DPU level.

In the past, before Silicon One was launched way back in 2019, Cisco has gone out to merchant silicon makers and built Nexus switches around their ASICs because they have capabilities that filled in niches with specific customers. We detailed such diverse switches way back in 2018 when Cisco was using ASICs from Broadcom, Innovium, and Barefoot Networks as well as its own homegrown chips. To our knowledge, Cisco never make a switch using Mellanox or Nvidia ASICs, and this will be the first time. Gilad Shainer, senior vice-president of marketing for Mellanox networking at Nvidia, tells The Next Platform that the deal between Nvidia and Cisco covers Spectrum 4 and multiple generations of ASICs out beyond that in the Spectrum family.

Nvidia will not be reselling these Cisco Nexus switches based on NX-OS, but will promote them as being part of the Spectrum-X family since interconnects using them will include DPUs for the reasons outlined above. Nvidia will continue to make and sell its own Spectrum-4 switches and will make switches for the foreseeable future. This is not about Nvidia getting out of the switch manufacturing business.

It is about creating a Cisco-flavored Spectrum-X offering, which both Nvidia and Cisco both believe is necessary to make a more familiar and palatable AI cluster offering to customers used to having Cisco as their networking vendor. It doesn’t hurt that Cisco has over 90,000 customers for its Unified Computing System (UCS) server platforms, which sport integrated Nexus networking and a Nexus Dashboard for managing it. If you want to sell Cisco enterprise and service provider customers AI clusters – as Nvidia does – you can’t expect them to give up on Nexus. They won’t do it, because the only thing stickier in the datacenter than a NOS is a database.

Cisco will also be creating Nexus switches that are based on its own Silicon One ASICs that can also in theory run SONiC or proprietary NOSes, but which will be considered a peer in the Spectrum-X lineup even through it is not running a port of Nvidia’s Cumulus Linux. (We had originally thought that Cumulus Linux was being ported to the Silicon One ASICs, which made sense to us, but that is not what is happening.)

The new Nexus switch that will be paired with Nvidia BlueField 3 DPUs will be based on the Silicon One G200 ASIC. We did a deep drilldown into the G200 back in June 2023 when it was launched, and this 51.2 Tb/sec ASIC is the first chip equipped with the packet spraying techniques that are necessary for managing AI elephant flows. The Spectrum-4 ASIC from Nvidia and the Tomahawk 5 ASIC from Broadcom also deliver 51.2 Tb/sec of aggregate throughput.

As far as we know, Nvidia has no intention of selling any Silicon One switches itself, just like it will not resell Cisco Nexus switches using its Spectrum 4 ASIC.

Cisco is the first such networking partner to have this Spectrum-X partnership with Nvidia, but we can envision that Arista Networks (really just a proxy for Broadcom ASICs at this point) might also want a similar deal. Shainer does not think this is a particularly useful idea. But he didn’t put the kibosh on the idea, either.

“I’m very frank, and I’m telling you that at this point that we’re focusing on what we announced with Cisco.” Shainer explains. “There is a good amount of work that we are going to do between us. There are a lot of things we want to do together. We want to go to enterprises with AI. We want to do a lot of activities together. What’s going to happen in far future, who knows?”

It will be interesting to see if either Nvidia or Cisco create a commercial distribution of SONiC for these switches. Both companies have software development kits to support proprietary NOSes, and further they have support for the Switch Abstraction Interface (SAI) layer that rides on top of those SDKs and shims underneath SONiC, which is itself based on a Linux kernel even though it was created by Microsoft, which would seemingly be allergic to Linux but is not. (You cannot be a cloud and not support Linux, and enthusiastically at that.) Microsoft donated SONiC to the open source community in March 2016.

We wonder how much pressure certain enterprises and service providers were giving Nvidia and Cisco to work together to give them more options on how to deploy Cisco NOSes and Spectrum-X add-ons to either Nvidia or Cisco switch ASICs. Shainer downplayed this as the motivation for the partnership, saying that this is just a practical approach to getting Cisco customers fired up about AI clusters and reducing the friction of adoption into Cisco shops for Nvidia iron. We suppose it all comes to the same, whether it is a push or a pull. The important thing is that Cisco and Nvidia are working together.

The first Spectrum-4 switches running NX-OS and the Spectrum-X add-ons as well as the first Silicon One switches bearing the Spectrum X deal of approval and paired with Nvidia DPUs and Nvidia NOS SDKs are expected later this year – very likely during the summer.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.