Comments on: Strong-Armed Into HPC, Like It Or Not

By: Hubert

Fri, 05 Aug 2022 07:46:48 +0000

Apologies for the late reply … I completely agree with you that this integrated approach has great advantages in terms of code development and deployment. It took me a while to understand better why the A64FX is not the performance leader that I expected … although it is very close, from the better perspective provided by HPCG. The dense matrices of HPL (regular top500) are one thing, but the sparse ones of HPCG (2nd list in top500) are a better fit to the numerical solution of PDEs (fluids, heat, contaminant transport) by finite difference and finite elements. Dense matrices are favored by accelerators while sparse ones require more address-generation gymnastic (or stencils) from the CPU. Top500 doesn’t give power consumption for HPCG so assuming that the machines use the same amount of power for both tests gives 1.5 MJ/PetaFlop for EPYC, 1.8 MJ/PF for A64FX, 2.5 MJ/PF for Xeon, and 3.5 MJ/PF for Power9. In HPCG then, the A64FX performs quite close to the highly-tuned EPYC (and better than the other archs). Also, if EPYC does indeed run at 1.5 MJ/PF in PHCG, then Frontier’s score would be 14 PF/s in HPCG, which would make it #2 to Fugaku (maybe that is why its HPCG score has not been reported?)!

]]>

By: Eric Olson

Sat, 30 Jul 2022 11:32:59 +0000

In reply to Hubert.

I agree that pushing the A64FX design forward is a good idea.

In my opinion the heterogeneous compute environment that comes from mixing GPU accelerators with CPUs makes efficient use and programming such an engineering challenge that only the biggest projects benefit. At the same time there may be more science in projects with large scale computing requirements but smaller teams of software engineers.

While I suspect security and administration are also more difficult for systems using GPU accelerators, some algorithms simply require a tighter coupling between the things CPUs are good at and the things GPUs are good at that can’t be achieved with a unified-memory coherent-cache architecture. To solve such problems it really helps for everything to be further combined into a unified instruction set.

]]>

By: Hubert

Fri, 29 Jul 2022 02:05:00 +0000

The EPI should, I think, collaborate scientifically with RIKEN (like Fujitsu-Siemens in the SPARC days) to improve the A64FX for higher performance and lower power consumption; for example by porting the stencil-based STX (reportedly 3x faster and 5x more power efficient than a GPU, per your report) to the A64FX.

]]>