I agree that pushing the A64FX design forward is a good idea.
In my opinion the heterogeneous compute environment that comes from mixing GPU accelerators with CPUs makes efficient use and programming such an engineering challenge that only the biggest projects benefit. At the same time there may be more science in projects with large scale computing requirements but smaller teams of software engineers.
While I suspect security and administration are also more difficult for systems using GPU accelerators, some algorithms simply require a tighter coupling between the things CPUs are good at and the things GPUs are good at that can’t be achieved with a unified-memory coherent-cache architecture. To solve such problems it really helps for everything to be further combined into a unified instruction set.
]]>