Comments on: The Year Ahead In Datacenter Compute https://www.nextplatform.com/2022/01/05/the-year-ahead-in-datacenter-compute/ In-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds. Fri, 14 Jan 2022 05:13:51 +0000 hourly 1 https://wordpress.org/?v=6.7.1 By: emerth https://www.nextplatform.com/2022/01/05/the-year-ahead-in-datacenter-compute/#comment-173773 Sun, 09 Jan 2022 23:49:07 +0000 https://www.nextplatform.com/?p=139831#comment-173773 In reply to Timothy Prickett Morgan.

I certainly hope the mi210 is a double chip card with slightly reduced perf from the 250/x. An Aldebaran with a single chip and ~22TF FP32/64 would be pointless, people would just buy nVidia.

]]>
By: Paul Berry https://www.nextplatform.com/2022/01/05/the-year-ahead-in-datacenter-compute/#comment-173749 Sat, 08 Jan 2022 01:32:43 +0000 https://www.nextplatform.com/?p=139831#comment-173749 In reply to Timothy Prickett Morgan.

You’re right, and 150k transisters seemed like a heck of a lot when the prior system had 50k.

]]>
By: Timothy Prickett Morgan https://www.nextplatform.com/2022/01/05/the-year-ahead-in-datacenter-compute/#comment-173742 Fri, 07 Jan 2022 16:04:43 +0000 https://www.nextplatform.com/?p=139831#comment-173742 In reply to Paul Berry.

Geez, Paul. I think systems were pretty diverse back then, and doing a lot of interesting stuff. Or rather, as much as people could think of at the time. When I look at the long evolution of systems, from Hollerith punch card machines in 1890 all the way up to today, I am amazed at the incremental innovation every step of the way to today, and each step is always difficult. We have always had 10 percent less than what we need, and it is how we cope that makes this all interesting. I liked the diversity of systems because many different ideas for systems, not just CPUs, were tested out in the field and there was a competition between these ideas in the market. Same as today. This is also what makes it interesting. I ain’t bored yet, and every time has its challenges.

]]>
By: Timothy Prickett Morgan https://www.nextplatform.com/2022/01/05/the-year-ahead-in-datacenter-compute/#comment-173740 Fri, 07 Jan 2022 15:58:04 +0000 https://www.nextplatform.com/?p=139831#comment-173740 In reply to Adriano.

For exactly the reason I said. A64FX is in and done. We have covered SiPearl and it will be just for a few machines at best. Qualcomm has no server part that I am aware of, although it is apparently working on some accelerator for HPC. We have talked about the Chinese and Korean chips, and they are again special cases for one-off and two-off machines. I’m talking mainstream datacenter compute — hyperscalers, cloud builders, and large enterprise. HPC centers can build an exascale machine out of squirrels on wheels with an abacus in each hand if they want to. . . as you well know.

]]>
By: Timothy Prickett Morgan https://www.nextplatform.com/2022/01/05/the-year-ahead-in-datacenter-compute/#comment-173737 Fri, 07 Jan 2022 15:48:51 +0000 https://www.nextplatform.com/?p=139831#comment-173737 In reply to Ziple.

You assume that the MI210 is not a double whammy, but I do not. I think the PCI-Express version will be almost the same performance as the MI250, as I showed here: https://www.nextplatform.com/2021/11/09/the-aldebaran-amd-gpu-that-won-exascale/

Slower clock, smaller memory, lower memory bandwidth so those who don’t want to use OAM can get a reasonable GPU from AMD for their systems. If I am right, then there is room for something that is a half or a little more than half the MI210 and still a lot better than the MI100.

]]>
By: Ziple https://www.nextplatform.com/2022/01/05/the-year-ahead-in-datacenter-compute/#comment-173731 Thu, 06 Jan 2022 20:01:10 +0000 https://www.nextplatform.com/?p=139831#comment-173731 The mi200 cutdown you are talking about was already announced it is the mi210.

]]>
By: Matt https://www.nextplatform.com/2022/01/05/the-year-ahead-in-datacenter-compute/#comment-173730 Thu, 06 Jan 2022 20:00:02 +0000 https://www.nextplatform.com/?p=139831#comment-173730 Please don’t use the old Intel naming terminology (“10 nm”). It will only confuse things more. Let’s have some consistency. Intel has changed its name to “Intel 7” and the Alder Lake CPUs suggests that it really is best compared with TSMC’s “7 nm” process.

I’ll be surprised if AMD’s Zen4 server chips are available in any volume at all soon after the launch of Intel’s Sapphire Rapids.

Regarding the NVIDIA GPU, I’m skeptical NVIDIA will come out with a 5 nm GPU in any volume in 2022. If they do it seems like a bit of a change from how early they produce on a node, historically. Perhaps if it’s a late 2022 product. For the A100 they already had lots of them floating around when they formally launched it at GTC in May 2020. If they use 5 nm for the successor and announce that at GTC 2022 in March I doubt many will be floating around until much later in 2022. I doubt NVIDIA will focus too much on FP64 in their new part unless they can use those transistors for lower precision compute as well. As the years go on the importance of lower precision compute as compared to FP64 compute is becoming greater and greater for NVIDIA’s data center business. AMD’s GPU business, in contrast, consists mostly of supercomputing facilities that rely a lot on FP64. I doubt NVIDIA will sacrifice performance in 95%+ of their revenue stream in order to defend the 5% from AMD. If expanding the FP64 comes at the cost of lower-precision performance NVIDIA is likely to either bifurcate their product line, as Intel originally planned to do with their HP and HPC lines, or they will allow AMD to have an advantage in supercomputing. I think they should have the money to do the former. Even if they don’t make much money on them, those supercomputers are a high-visibility “halo” market. Even though NVIDIA hates to do such a thing, if they can’t repurpose extra FP64 transistors efficiently, it might be worth it to bifurcate their product line to compete better with AMD and Intel in supercomputing.

Regarding supercomputing, it seems reports are now for Frontier to have early access pushed back to June 2022 with full user access pushed to Jan. 1, 2023. This despite assurances otherwise in October of 2021 when Aurora was (again) being pushed back.: https://executivegov.com/2021/12/installation-of-supercomputer-frontier-at-oak-ridge-national-lab-now-underway/ So Frontier seems to arrive not that much before Aurora, any more.
If Aurora really does hit 2.43 exaflops peak it will have a peak efficiency of about 24 MW / exaflops whereas the Frontier machine will have a peak efficiency of about 19 MW / exaflops. Intel is promising 45 FLOPs of FP64 vector performance per GPU. That means, with 54,000 GPUs in Aurora, we have 2.43 exflops of peak FP64 vector performance. So it checks out. Suppose the 650 Watts rumor is true. AMD, on their web site, is promising 45.3 TFLOPs of FP64 vector at 560 W peak. So the Intel GPU uses 16% more power for the same performance as the AMD GPU. If we instead use the 500 W number on AMD’s site we get the Intel GPU using 30% power of the AMD GPU. Frontier’s 19 MW/exaflops plus 16% is 22 MW/exaflops and if instead we had 30% to it we end up with 24.7 MW/exaflops. Aurora’s power usage should be at most around 24 MW/exaflops, putting it in that range. So that all checks out. So all indications are that, from a peak theoretical standpoint, Ponte Vecchio seems to use 15% to 30% more power for the same performance as Aldebaran. It’s not that much more power hungry. However, supercomputers using the A100 seem to use the A100’s FP64 matrix operations for their linpack numbers. So I would have expected that Frontier would do the same, but it seems that they are not because 1.5 exaflops divided by 36,000 GPUs is about 42 TFLOPs per GPU, around what Aldebaran can get from FP64 vector and half what it gets from FP64 matrix. However there could be something else going on there. I have to say somehow these accelerators and supercomputers seem to be shrouded in much more mystery and intrigue than is normal. Between the large number of execution units, the various power consumption figures quoted, vector instructions versus matrix instructions, and strange transistor to die size quotes it isn’t easy to determine how best to compare the accelerators. We will have to wait for some real world experience.

Finally, I have to say this has seemingly become a very AMD-Gung Ho site whereas it used to have a lot of faith in Intel.

]]>
By: Paul Berry https://www.nextplatform.com/2022/01/05/the-year-ahead-in-datacenter-compute/#comment-173729 Thu, 06 Jan 2022 19:23:19 +0000 https://www.nextplatform.com/?p=139831#comment-173729 “late 1980s… (Those were the days)”
An interesting time, full of diverse innovation, but also kind of a great British baking show kind of innovation.
‘Given a limit of 150,000 transisters, and 200 I/O pins, how fast can you make a processor run? -Oh, and get useful work done with 4MB of ram, and open source software doesn’t exist, so your operating system needs to be written by fewer than 20 people, or be based on At&t.’ So, those were the days for tech journalists (and also monetizing a print magazine vs web journalism), but computers really sucked. The pace of innovation, and diversity of ideas may have fallen off, but mostly because we have an embarrassment of riches: Transisters are basically free if you can figure out how to connect them. Ram capacity is free in order to get the bandwidth. Embedded graphics will drive half a dozen HD displays with real-time 3d graphics. Free operating systems include robust networking, encryption, security, dozens of programming languages, built in compilers. I kind of agree with the love of workstation and datacenter systems of that period, but all that innovation was to make very expensive, bespoke systems to do even a couple very basic tasks that are now all taken for granted.

]]>
By: Adriano https://www.nextplatform.com/2022/01/05/the-year-ahead-in-datacenter-compute/#comment-173727 Thu, 06 Jan 2022 17:32:29 +0000 https://www.nextplatform.com/?p=139831#comment-173727 Happy New Year!
While this is mainstream, no comment on Qualcomm, SiPearl or processors made in China, South Korea or Japan?

]]>