RIKEN has a nasty habit of being right, but also paying a premium price for a premium machine suited specifically to their tasks.
]]>Yes, well…..
]]>The best architecture for HPC is, and always has been custom, CPU only, homogeneous machines like Earth Simulator and K Computer. Fugaku will be no exception. Having CPUs that sit idle most of the time in heterogeneous machines is their biggest weakness. A64FX has the best traits of GPUs with none of the drawbacks of a heterogeneous machine.
Yes, its expensive, but “you get what you pay for” to use another over used quote.
]]>Except that Fujitsu’s A64FX is better where it actually matters. Their entire PrimeHPC lineup has had superior computational efficiency and better performance in benchmarks that actually matter to real world work like HPCG and Graph500 for a decade now. K was #1 on both lists until Summit came along with 15x the FLOPS.
Cray is even offering an A64FX based system because its the most advanced HPC CPU around. CPU only systems have always been the best systems on Top500, even when they’re not the fastest.
]]>Actually, there are plenty of reasons why Fujitsu’s PrimeHPC line, which started with K, is objectively superior to any accelerated system and it goes far beyond “being easy to program for”.
If you look at K, which dates back to 2011, and the massive amount of R&D that went into it, you’ll see that they created a system architecture specifically for HPC. The CPU itself was nothing special, but the Tofu interconnect and its HPC specific features, which were well integrated with the purpose built CPUs gave K 93% computational efficiency. Most accelerated systems, are around 65%.
Then there’s the fact that HPL doesn’t tell you anything these days. K was #1 on the Top500 when it came out, but it was surpassed by inferior systems, while it maintained its dominaton where it mattered. Inferior how? Look at the HPCG and Graph500 lists to find out.
K has roughly 5% of its Top500 Rmax in HPCG FLOPS, which is extremely high. K stayed #1 on HPCG and Graph500 until Summit and Sierra, the two most powerful supers came out many years later. Even then, they are brute force by comparison to the ancient but elegat K Computer, with only 1.5% of their HPL Rmax in HPCG FLOPS and 74% computational efficiency in HPL.
The PrimeHPC FX100 was released in 2015 with its custom 32+2 core CPU using HMC before anyone else was using 3D memory, and Fujitsu and NEC are the only companies with true 3D memory on CPUs to this day. While its true that GPUs are now using HBM, they require a host CPU, which doesn’t use it.
Discussing memory in HPC requires the discussion of Byte/FLOP ratios. Fujitsu has tried to maintain a .5 Byte/FLOP ratio since K. The only CPU that beats it is NEC’s SX-ACE, which has a ratio of 1:1. Their latest SX-Aurora Tsubasa is 0.5, like Fujitsu’s HPC machines. Most other systems have dismal overall B/F. Even the very advanced Summit has a system level 0.125 B/F compared to 0.37 for Fugaku. Having 3x the B/F is a huge deal in HPC.
The B/F ratio partially determines if a machine is good at real work, or a Top500 trick machine like Sunway Taihulight, which has 0.4% HPL to HPCG and dismal efficiency and B/F ratios.
If you want to get side tracked with who has the most advanced non-HBM RAM for CPUs, its IBM with their Centaur and OMI DIMMs, as some of them use 3DS TSV DDR4 and their memory agnostic buffers and RAS can’t be beat. They currently lack the HPC specific bandwidth of HBM, however. OMI DIMMs will change that.
PrimeHPC FX100’s SPARC XIfx also integrated the HPC specific network controller directly on the CPU, while most of today’s systems still rely on the ancient practice of using discrete network interface cards.
With Fugaku, the Japanese smartly switched to ARM, which they also happened to buy. Its basically a refinement of FX100 but using an ARM CPU, with all the advantages I mentioned about the other systems. Plus they added AI specific features so it can run low precision AI workloads, but on the scale of a massive system that scales without bottlenecking to sizes even larger than Fugaku, with an unbeatable B/F, unbeatable interconnect and computational efficiency.
As for its comparison to Ampere based systems, you can’t say that Ampere is better because its cheaper FLOPS. Look at the HPCG and Graph500 lists in the coming years for further proof.
]]>Yeah I see lots of positive ‘this is great’ rah rah, but It’s less computation for > $Billion, vs <200 million for NVDA Ampere systems? Programming is cited as "being easier", but A previous NextPlatform Article , on an AMD based super computer, described horrible admissions about trying to run AI-Optimized Molecular Modeling. The best stuff was/IS CUDA based, hence when they were Opting to run it, even though the Tech Lead of the system admitted AMD's ability to run the software "hasn't been smooth sailing" after "a lot if work'" , and even then after, presumably month(s), was only "starting to get good results". I Know everyone wants to act like there's an alternative, but it's OK to admit the meme'd Jensen quote " the more you buy the more you save" happens to be true. There's no excuse not to buy an NVDA accelerated system unless you're buying to subsidize another company, which is *Fine*, everyone wants competition & home grown solutions, even if they're not as good.
]]>Was going to post almost the same thing. I’ve been a huge fan of Fujitsu’s PrimeHPC systems. They are VERY underappreciated globally.
Most people outside of the HPC community don’t even know they exist, or realize that K topped the HPCG and Graph500 benchmark lists when there were systems like Taihulight with 10x the Linpack FLOPS on it as well. Those benchmarks are much more relevant than HPL these days.
The computational and interconnect efficiency of K, PrimeHPC FX10, FX100 and now FX1000 are unbeatable too. It took Summit and Sierra to dethrone the K computer. 93% computational efficiency and 5% HPL to HPCG efficiency is among the best on the list. Only NEC SX-ACE had higher HPL to HPCG at 11% but much lower computational efficiency. Summit and Sierra only have like 1.5% HPL to HPCG, but beat K because they have 10x the FLOPS. Taihulight only has dismal 0.4% HPL to HPCG.
Can’t wait for Fugaku to top HPCG for the next few years! Its too bad they didn’t give it a more catchy sounding name in English like K.
]]>