Comments on: HBM Gives Xeon SPs A Big Boost On Bandwidth Bound Work

By: Paul Berry

Thu, 17 Nov 2022 14:21:32 +0000

By: JayN

Thu, 17 Nov 2022 03:14:23 +0000

the phoronix site had a couple of articles on SPR’s new user interrupt feature, bypassing kernel for interrupting other cores. They claim “9x faster for lower IPC communication latency”. Were there any comments at sc22 on what this feature contributes to hpc/ai performance?

]]>

By: Timothy Prickett Morgan

Wed, 16 Nov 2022 20:51:35 +0000

By: MH

Wed, 16 Nov 2022 20:47:31 +0000

> Then to get a 1.6X delta seen with the High Performance Linpack (HPL) test shown below, the clock speeds have to drop from 2.6 GHz on the Ice Lake Xeon SP to 2.5 GHz on the Sapphire Rapids Xeon SP.

Well, since the benchmark benefits from HBM, it’s not so much that the clock speed has dropped but that the system-level performance of these non-HBM examples as a ceiling due to DRAM bandwidth.

Regarding cost – that will be very interesting to see as HBM parts are not cheap, and 4 of them in the processor package will have a significant effect on the CPU cost – quite likely the 2x mentioned in the article?

But then analyzing this at the system level, the in-package HBM makes it unnecessary to have 8 memory channels per socket that are populated with server-grade memory. That’s a significant saving, and on top of that there is a board area and complexity saving, and likely also a power and cooling saving (no need to drive “long” on-board wires with their terminating resistors).

And as board area is saved, compute density goes up – so there could be some kind of virtuous spiral that comes into effect, as long as all of the active data can fit in HBM.

]]>

By: HuMo

Tue, 15 Nov 2022 14:52:15 +0000

Great feeds and speeds! The HPCG results in particular, seeing how Fugaku (A64FX with HBM2) has roughly 1/3 of the performance of Frontier on HPL (EPYC+MI250x), but beats it by a tad on HPCG (16 vs 14 PFlop/s) — and here, similarly, we see Intel Max-with-HBM run HPCG 3x faster than EPYC-without-HBM, while HPL runs at about the same speed on both. As you suggest: “AMD has no intention of adding HBM […] could change its mind” — I totally agree! HBM really helps the memory access kung-fu needed by HPCG, while the dense matrix karate of HPL is already taken care of by vector/matrix units.

]]>