Hot Chips 31 Live Blogs: Habana's Approach to AI Scaling

Tuesday, August 20th, 2019 - Artificial Intelligence, Teknologi

Hot Chips 31 Live Blogs: Habana's Approach to AI Scaling


09:21PM EDT – The final talk today at Hot Chips is from Habana, who is discussing its approach to how to scale AI compute.

09:21PM EDT – Goya and Gaudi

09:22PM EDT – Recapping Training vs Inference requirements

09:24PM EDT – Goya processor architecure

09:24PM EDT – 3 engines, RPC, GEMM, and DMA. Work Concurrently with shared SRAM

09:24PM EDT – TPC is VLIW SIMD core, C-programmable

09:24PM EDT – PCIe Gen 4.0 x16

09:24PM EDT – Two DDR4-2666 channels, built on TSMC 16

09:25PM EDT – Supports UINT8 to FP32

09:25PM EDT – Dedicated HW and TPC ISA for special function acceneration

09:25PM EDT – Have to adjust quantization to mix accuracy vs power

09:26PM EDT – PCIe card – Software stack is more important.

09:26PM EDT – Habana is a software company that just happens to do hardware

09:27PM EDT – Graph compiler with built-in quantization engine

09:27PM EDT – Multiple recipes can be loaded for the hardware

09:28PM EDT – Goya supports models trained on any processor: CPU, GPU, TPU, Gaudi etc

09:28PM EDT – Users can create custom layers and kernels

09:29PM EDT – Still market leader since benchmarks made 11 months ago vs common CPU/GPU

09:29PM EDT – New for today, natural language benchmark results

09:30PM EDT – Support BERT architecture on Goya

09:30PM EDT – GEMMs and TPCs are fully utilized

09:30PM EDT – Chip was designed long before BERT was invested

09:30PM EDT – invented

09:30PM EDT – High degree of accuracy when quantized

09:30PM EDT – Software managed SRAM

09:31PM EDT – Now Gaudi, the training processor

09:31PM EDT – Performance at Scale, high throughput at low batch size, high power efficiency

09:32PM EDT – Enable native ethernet scale out – on chip RDMA over Converged Ethernet

09:32PM EDT – Open Compute Project Accelerator Module: OAM = (OCP)AM

09:32PM EDT – Framework and ML compiler support, rich TPC Kernet Library

09:32PM EDT – Architecture looks similar to Goya

09:33PM EDT – Networking has changed, memory has changed

09:33PM EDT – PCIe 4.0 x16, 4x8GB HBM

09:33PM EDT – 10x 100 GbE, or 20×50 GbE

09:33PM EDT – Supports UINT8 to FP32 and BF16

09:34PM EDT – SW supports profiling tools

09:34PM EDT – Only AI Training chip with RoCE v2

09:35PM EDT – NVIDIA was first to showcase RoCE v2 for AI, but they haven’t implemented it yet

09:36PM EDT – NVIDIA GPU is much more complex with RoCE v2 support via Mellanox

09:36PM EDT – Gaudi integrates both

09:36PM EDT – Supports Lossless and Lossy fabrics

09:36PM EDT – Advanced congestion controls

09:37PM EDT – Customers can buy OAM cards or an 8 card Server

09:38PM EDT – Server box has no CPU, up to customer to config to needed. Uses mini-SAS HD

09:38PM EDT – Ethernet connectivity for point-to-point links with non-blocking full mesh

09:38PM EDT – 3 ports per card for scale up

09:39PM EDT – Can choose ratio of CPUs to Gaudi cards

09:39PM EDT – Gaudi vs DGX

09:40PM EDT – Unlike DGX, do not force user to separate PCIe between management and scaleout. Gaudi offers separate PCIe ports

09:41PM EDT – PCIe card dual slot also available

09:41PM EDT – HL-0

09:41PM EDT – Data parallel possible, model parallel possible

09:44PM EDT – Can leapfrog performance over DGX-2 due to better connectivity. Can connect 64 gaudi chips with non-blocking throughput

09:45PM EDT – Q&A time

09:46PM EDT – Q: What type of quantization requires a processor? There is no quantization processor. There’s a software engine that takes an FP32 model and can quantize to data types that are more efficient and gives the feedback on the accuracy

09:47PM EDT – Q: Can you comment on interconnectivity of GEMM? A: It’s one functional unit.

09:48PM EDT – Q: What is the minimum viable for an IoT gateway? A: You can use a single card. You can put a gaudi in a single PCIe slot.

09:48PM EDT – That’s a wrap for today. More talks tomorrow!

Source link : Hot Chips 31 Live Blogs: Habana's Approach to AI Scaling


Pictures gallery of Hot Chips 31 Live Blogs: Habana's Approach to AI Scaling

Hot Chips 31 Live Blogs: Habana's Approach to AI Scaling | admin | 4.5