A Success on Arm for HPC: We Found a Fujitsu A64fx Wafer
When speaking about Arm in the enterprise space, the main angle for discussion is on the CPU side. Having a high-performance SoC at the heart of the server has been a key goal for many years, and we have had players such as Amazon, Ampere, Marvell, Qualcomm, Huawei, and others play for the server market. The other angle to attack is for co-processors and accelerators. Here we have one main participant: Fujitsu. We covered the A64FX when the design was disclosed at Hot Chips last year, with its super high cache bandwidth, and it will be available on a simple PCIe card. The main end-point for a lot of these cards will be the Fugaku / Post-K supercomputer in Japan, where we expect it to hit a one of the top numbers on the TOP500 supercomputer list next year.
After the design disclosure last year at Hot Chips, at Supercomputing 2018 we saw an individual chip on display. This year at Supercomputing 2019, we found a wafer.
I just wanted to post some photos. Enjoy.
The A64FX is the main recipient of the Arm Scalable Vector Extensions, new to Arm v8.2, which in this instance gives 48 computing cores with a 512-bit wide SIMD powered by 32 GiB of HBM2. Inside the chip is a custom network, and externally the chip is connected via a Tofu interconnect (6D/Torus), and the chip provides 2.7 TFLOPs of DGEMM performance. The chip itself is built on TSMC 7nm and has 8.786 billion transistors, but only 594 pins. Peak memory bandwidth is 1 TB/s.
The chip is built for both high performance, high throughput, and high performance per watt, supporting FP64 through to INT8. The L1 data cache is designed for sustained throughput, and power management is tightly controlled on chip. Either way you slice it, this chip is mightily impressive. We even saw HPE deploy two of these chips in a single half-width node.