Comments on: Stacking Up AMD Versus Nvidia For Llama 3.1 GPU Inference https://www.nextplatform.com/2024/07/29/stacking-up-amd-versus-nvidia-for-llama-3-1-gpu-inference/ In-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds. Mon, 19 Aug 2024 05:44:16 +0000 hourly 1 https://wordpress.org/?v=6.7.1 By: Robert https://www.nextplatform.com/2024/07/29/stacking-up-amd-versus-nvidia-for-llama-3-1-gpu-inference/#comment-231783 Mon, 19 Aug 2024 05:44:16 +0000 https://www.nextplatform.com/?p=144462#comment-231783 FP8 is officially supported in ROCm 6.2.

]]>
By: cc https://www.nextplatform.com/2024/07/29/stacking-up-amd-versus-nvidia-for-llama-3-1-gpu-inference/#comment-230826 Wed, 07 Aug 2024 05:47:55 +0000 https://www.nextplatform.com/?p=144462#comment-230826 Is the **No** Sparsity FP peak correct for nVidia ?

https://resources.nvidia.com/en-us-blackwell-architecture/blackwell-architecture-technical-brief?ncid=no-ncid
Table 3, All petaFLOPS and petaOPS are **with Sparsity** except FP64 which is dense.

]]>
By: Anon https://www.nextplatform.com/2024/07/29/stacking-up-amd-versus-nvidia-for-llama-3-1-gpu-inference/#comment-230762 Tue, 06 Aug 2024 10:33:10 +0000 https://www.nextplatform.com/?p=144462#comment-230762 I don’t believe AMD supports FP8 though?

]]>
By: Mickey Pearson https://www.nextplatform.com/2024/07/29/stacking-up-amd-versus-nvidia-for-llama-3-1-gpu-inference/#comment-230351 Fri, 02 Aug 2024 22:36:10 +0000 https://www.nextplatform.com/?p=144462#comment-230351 In reply to Timothy Prickett Morgan.

They don’t spend CapEx on one thing as you seem to repeat an AMD talking point. The folks who cut big checks for clusters want to make sure the stuff they are buying is not a one trick pony. That is the selling point for Nvidia despite all the claims by AMD. BTW, why don’t you ask them to submit these claims to MLPerf?

Tim, as much as I do not want to shoot the messenger you are much more than a messenger (hopefully).

]]>
By: Timothy Prickett Morgan https://www.nextplatform.com/2024/07/29/stacking-up-amd-versus-nvidia-for-llama-3-1-gpu-inference/#comment-230344 Fri, 02 Aug 2024 18:53:53 +0000 https://www.nextplatform.com/?p=144462#comment-230344 In reply to EC.

I think that’s a little unfair. I added pricing to it to show the relative cost, and if and when I have performance data for Llama 3.1, I will add that. It was food for thought, not a meal.

]]>
By: EC https://www.nextplatform.com/2024/07/29/stacking-up-amd-versus-nvidia-for-llama-3-1-gpu-inference/#comment-230152 Thu, 01 Aug 2024 20:28:01 +0000 https://www.nextplatform.com/?p=144462#comment-230152 I’m sorry, but this article is just elucidation of an AMD talking point: We have more memory per GPU. Great! We all get it, it’s been a non-stop message since MI300 launch hundreds of articles ago. It’s a single metric, and the message could have been published with just an a single image of the top chart.

Where are the performance numbers, across a variety of applications and hardware configurations, just like every other piece of high tech hardware that makes a claim?

AMD’s lack of laying the cards on the table (with MLPerf for example) is a cynical attempt to stake a high ground narrative while no one (press) holds them to account.

Based on link backs to this article it’s working.

]]>
By: Timothy Prickett Morgan https://www.nextplatform.com/2024/07/29/stacking-up-amd-versus-nvidia-for-llama-3-1-gpu-inference/#comment-230058 Wed, 31 Jul 2024 11:48:32 +0000 https://www.nextplatform.com/?p=144462#comment-230058 In reply to Jlagreen.

The reason this is the case is because the weights for a large model, depending on the size, fit in 8, 16, or sometimes 32 GPUs when it comes to inferencing. Hence, that is the comparison. Training is tens of thousands of GPUs trying to do one thing at once. Inference is like really freaking huge Web serving, conceptually. You get a big enough server to run the Web server and Java or whatever, and you round robin across as many as possible to serve the number of streams you have. So, I don’t agree.

]]>
By: Jlagreen https://www.nextplatform.com/2024/07/29/stacking-up-amd-versus-nvidia-for-llama-3-1-gpu-inference/#comment-230022 Wed, 31 Jul 2024 07:00:16 +0000 https://www.nextplatform.com/?p=144462#comment-230022 In reply to Timothy Prickett Morgan.

No, it’s not because of 2 things:

1. You and everyone else only check on inferencing and ignore that maybe customers want to do training as well. Public AI models are nice but using company specific data for training is an untapped market which will benefit primarily Nvidia.

2. Every benchmark so far is on 8x to 16x GPU systems and therefore a bit strange. How does benchmarking look like at scale? How does AMD vs. Nvidia perform if you combine a cluster with 100s or 1000s of GPUs? Everyone talks about their 1000s cluster GPUs and we benchmark only 8x GPUs in inferencing. It’s time for AMD to present itself at MLPerf. Until then it’s all cherry picked.

]]>
By: Timothy Prickett Morgan https://www.nextplatform.com/2024/07/29/stacking-up-amd-versus-nvidia-for-llama-3-1-gpu-inference/#comment-229922 Tue, 30 Jul 2024 16:24:59 +0000 https://www.nextplatform.com/?p=144462#comment-229922 In reply to Who?.

It’s all about supply.

]]>
By: Calamity Jim https://www.nextplatform.com/2024/07/29/stacking-up-amd-versus-nvidia-for-llama-3-1-gpu-inference/#comment-229877 Tue, 30 Jul 2024 07:15:12 +0000 https://www.nextplatform.com/?p=144462#comment-229877 Cool analysis! If I read well, today, MI300X, trundles, tramples, and trounces H100/200, but tomorrow, B100/200 will assert commensurate competitive riposte … to be followed by substantial throws, pins, chokes, jointlocks and ippons by MI325X/350/400X. This is competition at its best (eh-eh-eh)!

]]>