As far as I know, AI training still takes a fair amount of FP64 every so often, even if it is also using a lot of FP32 and FP16. So in some way, it still matters. I was using FP64 as a proxy for heavy duty performance, just I was using INT8 as a proxy for high throughout small data like that used for AI inference.
Agree with what you said, obviously. Easy is better. But at some point, cheaper is better than easier.
]]>I would think for a research AI supercomputer the rich development library available in NVIDIA’s ecosystem far outweighs a desire for OAM. From a business perspective, it’s easy to imagine that for high volume operations a company will focus its large resources on a narrow range of development relating to its operations to get a financially-beneficial architecture, such as OAM or Open Compute Project, up and running. But for research it will want to allow its scientists to be as productive and unrestricted as possible, else it risks falling behind the industry while trying to serve two masters. I doubt meta expects much parallelization using compiler directives for code running on its new supercomputer, and something like Tesla’s Dojo is more a steam hammer for churning out a product than a lab bench for doing research.
]]>