Oooooooh. Interesting!
]]>The paper for that SC23 presentation is available in open access ( https://arxiv.org/abs/2309.00381 ). Figures 4 and 6 compare the 64-core Sophon SG2042 (64x XuanTie C920 RISC-V cores with 128-bit vectors) to 4-core Sandybridge, 18-core Broadwell, 28-core Icelake, and 64-core Rome, in single-core and multi-threaded modes, respectively. It makes fine reading for a cloudy weekend!
]]>Ah-ah-ah! Quite funny … I quite agree with your comment to TNP’s 05/18/23 piece on Meta’s Training and Inference Accelerator (MTIA) for DLRM, that has 64x(1-scalar + 1-vector) RISC-V cores on a PCIe board, with a huge fan ( https://www.nextplatform.com/2023/05/18/meta-platforms-crafts-homegrown-ai-inference-chip-ai-training-next/ )!
I just found that there’ll be some RISC-V HPC benchmark at SC23 Denver this Monday (Nov. 13), but the Author Abstract states “the x86 […] CPUs […] outperform the SG2042 by between four and eight times” ( https://sc23.conference-program.com/presentation/?id=ws_risc111&sess=sess455 ).
]]>Fetch width is 16 bytes, so 8-wide decode only for 16-bit instructions, and 4-wide for 32-bit instructions. Rename is just 4 wide…
The cache is per core. So total 1.6MB cache plus 4MB L3, or 5.6MB per core. That’s even more cache than Genoa (5MB total cache per core). Graviton 3 has just 1.6MB per core.
However the real kicker is that in order to beat Genoa, Veyron V2 needs twice the number of cores and cache (and thus die size). The large die size and relatively low per-core performance makes it uncompetitive for cloud uses.
]]>Having ported to Arm and Power, it is that much easier to port the Linux stack to RISC-V — so everyone tells me.
]]>If it is indeed fifteen 32-bit instructions per clock, and 512 KB of L1$I per core, then I wonder if they can hit the apparent sweet spot of 2.5 mm^2 per core (at 4nm) that Neoverse V2 gets (at 7 nm) and Zen 4c has (at 5 nm)? Or maybe it is more of an “out-there” design, like Microsoft’s “chiplet cloud”, that is not realistically designed for tape-out ( https://www.nextplatform.com/2023/07/12/microsofts-chiplet-cloud-to-bring-the-cost-of-llms-way-down/ )?
]]>They could also focus (pivot) on interfacing with quantum computers, in pre- and post-processing roles, replacing FPGAs there. It’s not a very large market at present but it is the future (more so than dataflow I think). The Quantum Approximate Optimization Algorithm (QAOA), for example, could really help solve graph-oriented problems more efficiently than more conventional recursive bounded search tree algos that involve substantial backtracking by necessity (to tackle McCarthy’s non-determinism) (eg. https://www.nextplatform.com/2023/09/21/beyond-the-traveling-salesman-escape-routes-get-a-quantum-overhaul/ ).
]]>