Comments on: With Fugaku Supercomputer Installed, RIKEN Takes On Coronavirus

By: Bernd P

Mon, 22 Jun 2020 18:59:22 +0000

Well, there is already another “big thing” dawning at the horizon. This time in the U.S. again. The “FRONTIER” HPC. planned to perform (and there at the maximum of the achievable precision scale, because at lower precision we can always go faster respectively!) at 1.5 exaflops. planned to go online 2021 for the first time. maybe with some delay, because of the corona pandemy impacts.

]]>

By: Timothy Prickett Morgan

Thu, 28 May 2020 13:05:49 +0000

By: Timothy Prickett Morgan

Thu, 28 May 2020 13:00:58 +0000

By: dinglehopper

Thu, 28 May 2020 12:51:07 +0000

By: Dr. Ivo Robotnik

Sat, 23 May 2020 12:29:23 +0000

By: B F

Fri, 22 May 2020 11:12:51 +0000

Did not know ‘512-but vector engines’ were a thing, ‘being enlisted int eh fight against COVID-19’.

]]>

By: BlackDove

Thu, 21 May 2020 12:37:38 +0000

In reply to Andrew.

Actually, there are plenty of reasons why Fujitsu’s PrimeHPC line, which started with K, is objectively superior to any accelerated system and it goes far beyond “being easy to program for”.

If you look at K, which dates back to 2011, and the massive amount of R&D that went into it, you’ll see that they created a system architecture specifically for HPC. The CPU itself was nothing special, but the Tofu interconnect and its HPC specific features, which were well integrated with the purpose built CPUs gave K 93% computational efficiency. Most accelerated systems, are around 65%.

Then there’s the fact that HPL doesn’t tell you anything these days. K was #1 on the Top500 when it came out, but it was surpassed by inferior systems, while it maintained its dominaton where it mattered. Inferior how? Look at the HPCG and Graph500 lists to find out.

K has roughly 5% of its Top500 Rmax in HPCG FLOPS, which is extremely high. K stayed #1 on HPCG and Graph500 until Summit and Sierra, the two most powerful supers came out many years later. Even then, they are brute force by comparison to the ancient but elegat K Computer, with only 1.5% of their HPL Rmax in HPCG FLOPS and 74% computational efficiency in HPL.

The PrimeHPC FX100 was released in 2015 with its custom 32+2 core CPU using HMC before anyone else was using 3D memory, and Fujitsu and NEC are the only companies with true 3D memory on CPUs to this day. While its true that GPUs are now using HBM, they require a host CPU, which doesn’t use it.

Discussing memory in HPC requires the discussion of Byte/FLOP ratios. Fujitsu has tried to maintain a .5 Byte/FLOP ratio since K. The only CPU that beats it is NEC’s SX-ACE, which has a ratio of 1:1. Their latest SX-Aurora Tsubasa is 0.5, like Fujitsu’s HPC machines. Most other systems have dismal overall B/F. Even the very advanced Summit has a system level 0.125 B/F compared to 0.37 for Fugaku. Having 3x the B/F is a huge deal in HPC.

The B/F ratio partially determines if a machine is good at real work, or a Top500 trick machine like Sunway Taihulight, which has 0.4% HPL to HPCG and dismal efficiency and B/F ratios.

If you want to get side tracked with who has the most advanced non-HBM RAM for CPUs, its IBM with their Centaur and OMI DIMMs, as some of them use 3DS TSV DDR4 and their memory agnostic buffers and RAS can’t be beat. They currently lack the HPC specific bandwidth of HBM, however. OMI DIMMs will change that.

PrimeHPC FX100’s SPARC XIfx also integrated the HPC specific network controller directly on the CPU, while most of today’s systems still rely on the ancient practice of using discrete network interface cards.

With Fugaku, the Japanese smartly switched to ARM, which they also happened to buy. Its basically a refinement of FX100 but using an ARM CPU, with all the advantages I mentioned about the other systems. Plus they added AI specific features so it can run low precision AI workloads, but on the scale of a massive system that scales without bottlenecking to sizes even larger than Fugaku, with an unbeatable B/F, unbeatable interconnect and computational efficiency.

As for its comparison to Ampere based systems, you can’t say that Ampere is better because its cheaper FLOPS. Look at the HPCG and Graph500 lists in the coming years for further proof.

]]>

By: Andrew

Wed, 20 May 2020 17:38:07 +0000

By: BlackDove

Tue, 19 May 2020 13:02:02 +0000

In reply to Eric Olson.

Was going to post almost the same thing. I’ve been a huge fan of Fujitsu’s PrimeHPC systems. They are VERY underappreciated globally.

Most people outside of the HPC community don’t even know they exist, or realize that K topped the HPCG and Graph500 benchmark lists when there were systems like Taihulight with 10x the Linpack FLOPS on it as well. Those benchmarks are much more relevant than HPL these days.

The computational and interconnect efficiency of K, PrimeHPC FX10, FX100 and now FX1000 are unbeatable too. It took Summit and Sierra to dethrone the K computer. 93% computational efficiency and 5% HPL to HPCG efficiency is among the best on the list. Only NEC SX-ACE had higher HPL to HPCG at 11% but much lower computational efficiency. Summit and Sierra only have like 1.5% HPL to HPCG, but beat K because they have 10x the FLOPS. Taihulight only has dismal 0.4% HPL to HPCG.

Can’t wait for Fugaku to top HPCG for the next few years! Its too bad they didn’t give it a more catchy sounding name in English like K.

]]>

By: Jeff Zais

Tue, 19 May 2020 04:19:18 +0000

So Fujitsu documents indicate this system is packaged at 384 nodes in a rack. So this works out to be exactly 414 racks. Yup, that is “over 400”, this all makes sense.

]]>