https://resources.nvidia.com/en-us-blackwell-architecture/blackwell-architecture-technical-brief?ncid=no-ncid
Table 3, All petaFLOPS and petaOPS are **with Sparsity** except FP64 which is dense.
They don’t spend CapEx on one thing as you seem to repeat an AMD talking point. The folks who cut big checks for clusters want to make sure the stuff they are buying is not a one trick pony. That is the selling point for Nvidia despite all the claims by AMD. BTW, why don’t you ask them to submit these claims to MLPerf?
Tim, as much as I do not want to shoot the messenger you are much more than a messenger (hopefully).
]]>I think that’s a little unfair. I added pricing to it to show the relative cost, and if and when I have performance data for Llama 3.1, I will add that. It was food for thought, not a meal.
]]>Where are the performance numbers, across a variety of applications and hardware configurations, just like every other piece of high tech hardware that makes a claim?
AMD’s lack of laying the cards on the table (with MLPerf for example) is a cynical attempt to stake a high ground narrative while no one (press) holds them to account.
Based on link backs to this article it’s working.
]]>The reason this is the case is because the weights for a large model, depending on the size, fit in 8, 16, or sometimes 32 GPUs when it comes to inferencing. Hence, that is the comparison. Training is tens of thousands of GPUs trying to do one thing at once. Inference is like really freaking huge Web serving, conceptually. You get a big enough server to run the Web server and Java or whatever, and you round robin across as many as possible to serve the number of streams you have. So, I don’t agree.
]]>No, it’s not because of 2 things:
1. You and everyone else only check on inferencing and ignore that maybe customers want to do training as well. Public AI models are nice but using company specific data for training is an untapped market which will benefit primarily Nvidia.
2. Every benchmark so far is on 8x to 16x GPU systems and therefore a bit strange. How does benchmarking look like at scale? How does AMD vs. Nvidia perform if you combine a cluster with 100s or 1000s of GPUs? Everyone talks about their 1000s cluster GPUs and we benchmark only 8x GPUs in inferencing. It’s time for AMD to present itself at MLPerf. Until then it’s all cherry picked.
]]>It’s all about supply.
]]>