Hot Chips 31 Live Blogs: Xilinx Versal AI Engine

Thursday, August 22nd, 2019 - FPGAs, Teknologi

Hot Chips 31 Live Blogs: Xilinx Versal AI Engine

Advertisement

02:34PM EDT – Xilinx, the manufacturer of FPGAs, announced its new Versal AI engine last year as a way of moving FPGAs into the AI domain. This talk is set to expand on those announcements.

02:36PM EDT – Xilinx device categories: FPGA, SoC, ACAP. Versal is ACAP

02:37PM EDT – ACAP = Adaptive Compute Acceleration Platform

02:37PM EDT – Scalar processors, programmable logic, AI engines/DSP engines, onboard networking

02:38PM EDT – Currently shipping samples to early customers

02:38PM EDT – TSMC 7nm, 37B transistors, 855 Mb on-die memory, 400 engine cores, 785 IOs, 44 SerDes

02:39PM EDT – Versal NOC – vertical NOC and horizontal NOC

02:39PM EDT – Packetized NOC, Aggressive clock gating

02:40PM EDT – >1 Tbps bidrectional bandwidth per row, >0.5 Tbps bidirectional bandwidth per column

02:40PM EDT – Always have virtual channels in the NOC

02:40PM EDT – Compute acceleration is all about data movement

02:41PM EDT – Unified Memory Subsystem

02:43PM EDT – 2nd gen CCIX

02:44PM EDT – Coherent Home Node, and L2 cache

02:45PM EDT – CCIX ESM – supports PCIe Gen 5 x16

02:45PM EDT – Versal Processor System

02:45PM EDT – in all Versal devices

02:46PM EDT – Dual core a72

02:46PM EDT – Dual Core R5 RPU w/locksetp

02:46PM EDT – Platform Management Control

02:47PM EDT – Crypto accelerators for security

02:47PM EDT – 10 Gbps Debug and trace interface

02:48PM EDT – 50 Gbps config interface

02:48PM EDT – Hardware RoT

02:48PM EDT – SHA3 engine, AES, RSA, AES-key

02:48PM EDT – Programmable logic – 900K LUTs, 2M LC, and 1.8M FLOPs

02:48PM EDT – 4x larger CLB -> 64 flip flops

02:48PM EDT – 158 Mb of URAM and BRAM – 50% lower power than prev gen

02:48PM EDT – Versal Programmable Logic DSP, DSP58

02:48PM EDT – 1958x DSP58 with FP support

02:49PM EDT – 400 AI Engine Tiles, 133 TOPs in INT8

02:49PM EDT – Non-blockign interconnect mesh

02:49PM EDT – 12.5MB L1 distributed memory

02:51PM EDT – In AI engine: 32b Scalar RISC Processor, 2 scalar ops/stream access, 512-bit SIMD vector processor, vec128int8 or vec8fp32, 7+ OPs per cycle VLIW

02:51PM EDT – 128 INT8 MACs per cycle per core

02:51PM EDT – Suited for signal processing workloads

02:52PM EDT – Data movement is done through orchistrated DMA

02:53PM EDT – Memory caches supports multicast/broadcast

02:53PM EDT – Very efficient transmission of streamed data

02:56PM EDT – Versal offers deterministic performance and low latency

02:56PM EDT – Map the compute onto different parts of the ACAP

02:57PM EDT – Software programmable framework

02:58PM EDT – Trying to abstract the programming from hardware into regular C++

02:58PM EDT – Frameworks like mxnet, TensorFlow, Caffee

02:59PM EDT – First Xilinx 7nm device, 133 TOPs, PCIe Gen4 and CCIX

02:59PM EDT – Adaptable heterogeneous system architecture

03:00PM EDT – Q&A

03:00PM EDT – Q: HBM? A: Coming

03:01PM EDT – Q: Support CCIX? Support for Gen-Z and CXL? You’re members of those consortiums. A: No official plan on Gen-Z. CXL is still new.

03:01PM EDT – That’s a wrap. Next talk is Intel’s Spring Crest.

Source link : Hot Chips 31 Live Blogs: Xilinx Versal AI Engine

Advertisement

Pictures gallery of Hot Chips 31 Live Blogs: Xilinx Versal AI Engine

Hot Chips 31 Live Blogs: Xilinx Versal AI Engine | admin | 4.5