Twenty years ago, whrn buying HPC, I used to specify the lowest power consumption processors instead of the fastest. The loss of real performance, on Fluent and the like, was negligible. There were fewer supply issues and we were in a good position to negotiate on price, particularly as we had really simplified acceptance criteria and a record of early payment. Useless for HPC headlines, of course…
]]>My understanding is that the Nvidia GPU is still spending a lot of time scratching itself waiting for data to hit the cores, and it is memory capacity constrained even if it is not as bandwidth constrained. Still a memory problem, and Grace will help fix that by being a glorified memory controller with 512 GB of memory capacity.
]]>You’re quite right (I think). In HPL (dense matrices, Karate), the top500 machines run at 65% of theoretical peak (Frontier at least), but in HPCG (sparse matrices, Kung-Fu), they run at 2% of theoretical peak. I would expect that they run even farther from peak in Graph500 (depth-first and bradth-first search) where the impact of latency would be felt more strongly due to the rather more unpredictable memory-access patterns.
]]>“Some” – probably yes. PIM works really well if you have a lot of math to do on discrete chunks of memory. If you can partition the work so that most of the data access fits within the memory device, and you rarely have to go off to other devices, it’ll work well. That’s not all bandwidth starved codes, and it’s one that’s already pretty well served by caching hardware. The even harder problems – like the sparse-matrix codes will get no benefit.
]]>Yes, that is the next one in the series …. HA!