I certainly hope the mi210 is a double chip card with slightly reduced perf from the 250/x. An Aldebaran with a single chip and ~22TF FP32/64 would be pointless, people would just buy nVidia.
]]>You’re right, and 150k transisters seemed like a heck of a lot when the prior system had 50k.
]]>Geez, Paul. I think systems were pretty diverse back then, and doing a lot of interesting stuff. Or rather, as much as people could think of at the time. When I look at the long evolution of systems, from Hollerith punch card machines in 1890 all the way up to today, I am amazed at the incremental innovation every step of the way to today, and each step is always difficult. We have always had 10 percent less than what we need, and it is how we cope that makes this all interesting. I liked the diversity of systems because many different ideas for systems, not just CPUs, were tested out in the field and there was a competition between these ideas in the market. Same as today. This is also what makes it interesting. I ain’t bored yet, and every time has its challenges.
]]>For exactly the reason I said. A64FX is in and done. We have covered SiPearl and it will be just for a few machines at best. Qualcomm has no server part that I am aware of, although it is apparently working on some accelerator for HPC. We have talked about the Chinese and Korean chips, and they are again special cases for one-off and two-off machines. I’m talking mainstream datacenter compute — hyperscalers, cloud builders, and large enterprise. HPC centers can build an exascale machine out of squirrels on wheels with an abacus in each hand if they want to. . . as you well know.
You assume that the MI210 is not a double whammy, but I do not. I think the PCI-Express version will be almost the same performance as the MI250, as I showed here: https://www.nextplatform.com/2021/11/09/the-aldebaran-amd-gpu-that-won-exascale/
Slower clock, smaller memory, lower memory bandwidth so those who don’t want to use OAM can get a reasonable GPU from AMD for their systems. If I am right, then there is room for something that is a half or a little more than half the MI210 and still a lot better than the MI100.
]]>I’ll be surprised if AMD’s Zen4 server chips are available in any volume at all soon after the launch of Intel’s Sapphire Rapids.
Regarding the NVIDIA GPU, I’m skeptical NVIDIA will come out with a 5 nm GPU in any volume in 2022. If they do it seems like a bit of a change from how early they produce on a node, historically. Perhaps if it’s a late 2022 product. For the A100 they already had lots of them floating around when they formally launched it at GTC in May 2020. If they use 5 nm for the successor and announce that at GTC 2022 in March I doubt many will be floating around until much later in 2022. I doubt NVIDIA will focus too much on FP64 in their new part unless they can use those transistors for lower precision compute as well. As the years go on the importance of lower precision compute as compared to FP64 compute is becoming greater and greater for NVIDIA’s data center business. AMD’s GPU business, in contrast, consists mostly of supercomputing facilities that rely a lot on FP64. I doubt NVIDIA will sacrifice performance in 95%+ of their revenue stream in order to defend the 5% from AMD. If expanding the FP64 comes at the cost of lower-precision performance NVIDIA is likely to either bifurcate their product line, as Intel originally planned to do with their HP and HPC lines, or they will allow AMD to have an advantage in supercomputing. I think they should have the money to do the former. Even if they don’t make much money on them, those supercomputers are a high-visibility “halo” market. Even though NVIDIA hates to do such a thing, if they can’t repurpose extra FP64 transistors efficiently, it might be worth it to bifurcate their product line to compete better with AMD and Intel in supercomputing.
Regarding supercomputing, it seems reports are now for Frontier to have early access pushed back to June 2022 with full user access pushed to Jan. 1, 2023. This despite assurances otherwise in October of 2021 when Aurora was (again) being pushed back.: https://executivegov.com/2021/12/installation-of-supercomputer-frontier-at-oak-ridge-national-lab-now-underway/ So Frontier seems to arrive not that much before Aurora, any more.
If Aurora really does hit 2.43 exaflops peak it will have a peak efficiency of about 24 MW / exaflops whereas the Frontier machine will have a peak efficiency of about 19 MW / exaflops. Intel is promising 45 FLOPs of FP64 vector performance per GPU. That means, with 54,000 GPUs in Aurora, we have 2.43 exflops of peak FP64 vector performance. So it checks out. Suppose the 650 Watts rumor is true. AMD, on their web site, is promising 45.3 TFLOPs of FP64 vector at 560 W peak. So the Intel GPU uses 16% more power for the same performance as the AMD GPU. If we instead use the 500 W number on AMD’s site we get the Intel GPU using 30% power of the AMD GPU. Frontier’s 19 MW/exaflops plus 16% is 22 MW/exaflops and if instead we had 30% to it we end up with 24.7 MW/exaflops. Aurora’s power usage should be at most around 24 MW/exaflops, putting it in that range. So that all checks out. So all indications are that, from a peak theoretical standpoint, Ponte Vecchio seems to use 15% to 30% more power for the same performance as Aldebaran. It’s not that much more power hungry. However, supercomputers using the A100 seem to use the A100’s FP64 matrix operations for their linpack numbers. So I would have expected that Frontier would do the same, but it seems that they are not because 1.5 exaflops divided by 36,000 GPUs is about 42 TFLOPs per GPU, around what Aldebaran can get from FP64 vector and half what it gets from FP64 matrix. However there could be something else going on there. I have to say somehow these accelerators and supercomputers seem to be shrouded in much more mystery and intrigue than is normal. Between the large number of execution units, the various power consumption figures quoted, vector instructions versus matrix instructions, and strange transistor to die size quotes it isn’t easy to determine how best to compare the accelerators. We will have to wait for some real world experience.
Finally, I have to say this has seemingly become a very AMD-Gung Ho site whereas it used to have a lot of faith in Intel.
]]>