Comments on: Compute Is Easy, Memory Is Harder And Harder https://www.nextplatform.com/2022/12/13/compute-is-easy-memory-is-harder-and-harder/ In-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds. Tue, 10 Jan 2023 04:44:43 +0000 hourly 1 https://wordpress.org/?v=6.7.1 By: Chris Cartledge https://www.nextplatform.com/2022/12/13/compute-is-easy-memory-is-harder-and-harder/#comment-202502 Thu, 22 Dec 2022 21:01:09 +0000 https://www.nextplatform.com/?p=141665#comment-202502 In reply to Timothy Prickett Morgan.

Twenty years ago, whrn buying HPC, I used to specify the lowest power consumption processors instead of the fastest. The loss of real performance, on Fluent and the like, was negligible. There were fewer supply issues and we were in a good position to negotiate on price, particularly as we had really simplified acceptance criteria and a record of early payment. Useless for HPC headlines, of course…

]]>
By: Art Scott https://www.nextplatform.com/2022/12/13/compute-is-easy-memory-is-harder-and-harder/#comment-202157 Thu, 15 Dec 2022 01:12:48 +0000 https://www.nextplatform.com/?p=141665#comment-202157 Tufts U HotGuage. Landauer E disspHEAT = kTln2. Icarus. Computationally intensive kernels. Helios.

]]>
By: Timothy Prickett Morgan https://www.nextplatform.com/2022/12/13/compute-is-easy-memory-is-harder-and-harder/#comment-202144 Wed, 14 Dec 2022 18:24:02 +0000 https://www.nextplatform.com/?p=141665#comment-202144 In reply to EC.

My understanding is that the Nvidia GPU is still spending a lot of time scratching itself waiting for data to hit the cores, and it is memory capacity constrained even if it is not as bandwidth constrained. Still a memory problem, and Grace will help fix that by being a glorified memory controller with 512 GB of memory capacity.

]]>
By: Hubert https://www.nextplatform.com/2022/12/13/compute-is-easy-memory-is-harder-and-harder/#comment-202143 Wed, 14 Dec 2022 17:20:45 +0000 https://www.nextplatform.com/?p=141665#comment-202143 In reply to Paul Berry.

You’re quite right (I think). In HPL (dense matrices, Karate), the top500 machines run at 65% of theoretical peak (Frontier at least), but in HPCG (sparse matrices, Kung-Fu), they run at 2% of theoretical peak. I would expect that they run even farther from peak in Graph500 (depth-first and bradth-first search) where the impact of latency would be felt more strongly due to the rather more unpredictable memory-access patterns.

]]>
By: EC https://www.nextplatform.com/2022/12/13/compute-is-easy-memory-is-harder-and-harder/#comment-202142 Wed, 14 Dec 2022 16:50:24 +0000 https://www.nextplatform.com/?p=141665#comment-202142 >>We have to get these HPC and AI architectures back in whack.<< Can't speak to the HPC side, but isn't prioritizing memory bandwidth and access precisely what Nvidia's Grace+Hopper architecture is attempting? A machine architecture built to a specific purpose? Nvidia gets the "specifics of the applications" driving them due to their unique position supplying the accelerator and software kernals. Nvidia cut it's teeth counting cycles and optimizing game applications. Machine learning is more of that but on a much much larger scale.

]]>
By: Paul Berry https://www.nextplatform.com/2022/12/13/compute-is-easy-memory-is-harder-and-harder/#comment-202139 Wed, 14 Dec 2022 15:04:12 +0000 https://www.nextplatform.com/?p=141665#comment-202139 In reply to Odd.

“Some” – probably yes. PIM works really well if you have a lot of math to do on discrete chunks of memory. If you can partition the work so that most of the data access fits within the memory device, and you rarely have to go off to other devices, it’ll work well. That’s not all bandwidth starved codes, and it’s one that’s already pretty well served by caching hardware. The even harder problems – like the sparse-matrix codes will get no benefit.

]]>
By: Christopher N Bush https://www.nextplatform.com/2022/12/13/compute-is-easy-memory-is-harder-and-harder/#comment-202130 Wed, 14 Dec 2022 12:02:20 +0000 https://www.nextplatform.com/?p=141665#comment-202130 …mob hit… funny ! Very informative and well written.

]]>
By: Timothy Prickett Morgan https://www.nextplatform.com/2022/12/13/compute-is-easy-memory-is-harder-and-harder/#comment-202128 Wed, 14 Dec 2022 11:57:31 +0000 https://www.nextplatform.com/?p=141665#comment-202128 In reply to David Sorensen.

Yes, that is the next one in the series …. HA!

]]>
By: Vishyat Saini https://www.nextplatform.com/2022/12/13/compute-is-easy-memory-is-harder-and-harder/#comment-202124 Wed, 14 Dec 2022 10:29:21 +0000 https://www.nextplatform.com/?p=141665#comment-202124 This article came in my google chrome suggestions. Right from my childhood, I love topics related to cpu and memories. I have bookmarked your website for more future article reading.

]]>
By: Paul Berry https://www.nextplatform.com/2022/12/13/compute-is-easy-memory-is-harder-and-harder/#comment-202103 Tue, 13 Dec 2022 20:52:55 +0000 https://www.nextplatform.com/?p=141665#comment-202103 The ideal ratio is not so simple to determine as Dongarra suggests, as it is different for every application. As tempting as it is to say that the Cray-1 got the ratio correct, there are and were a lot of applications that need a lot of flops, and don’t make great use of the memory bandwidth, or for which a couple megabytes of sram cache are enough. There are also plenty of applications somewhere between the extremes. For some of these applications, it would be an economic crime to pay for a lot of memory bandwidth that goes unused.
This is where Cray Research had problems in the early 90s, when they were trying to sell high bandwidth machines to solve all problems when some applications really needed it, and some really didn’t. Then, as now, it’s difficult to find a solution that makes everyone happy, and even harder to do that cheaply.
I think Dongarra is right in hoping microprocessor vendors are able to mix and match cores, cache, and memory controllers to offer different ratios, using the same building blocks. You’re probably not going to see Cray-1 style ratios, but a 10-100 spread of ratios might be plausible.

]]>