ST2401

More documents

Recommendations

Info

TECHNOLOGY: RDMA AVOID MEMORY BOTTLENECKS TO MASTER AI WORKLOADS MICHAEL MCNERNEY,VP MARKETING AND NETWORK SECURITY AT SUPERMICRO, EXAMINES HOW ENTERPRISES ARE OPTIMISING THEIR GPU SERVERS TO BETTER MANAGE THE MASSIVE DATA STORAGE REQUIREMENTS OF AI APPLICATIONS Modern enterprises are gaining considerable competitive advantages by embracing AI and machine learning. Large language models such as ChatGPT, machine learning analyses based on enormous sets of training and real data and complex 3D and finite element models and simulations have at least this much in common: They benefit significantly from expedited access to storage across any kind of tiered model you might care to use. That's one major reason why so many enterprises and service providers have turned to GPU servers to handle large, complicated datasets and the workloads that consume them. They're much more capable of handling those workloads and can complete such tasks more quickly than conventional servers with more typical storage configurations (e.g. local RAM and NVMe SSDs, with additional storage tiers on the LAN or in the cloud). The secret to boosting throughput is reduced latency and better storage bandwidth. These translate directly into improved productivity and capability, primarily through clever IO and networking techniques that rely on direct and remote memory access. Faster model training and job completion mean AIpowered applications can be deployed more quickly, and get things done faster, speeding time to value. GIVING GPUS DIRECT MEMORY ACCESS Direct memory access (DMA) has been used to speed IO since the early days of computing. Basically, DMA involves memory-to-memory transfers across a bus (or another interface of some kind) from one device to another. It works by copying a range of memory addresses directly from the sender's memory to the receiver's memory (or between two parties for two-way transfers). This feature takes the CPU out of the process and speeds transfer by reducing the number of copy operations involved (so that the CPU need not copy the sender's data into its memory, then copy that data from its memory to the receiver's memory). Indeed, DMA performance on a single system is limited only by the speed of the bus (or other interface) that links the sending and receiving devices involved in a data transfer. For PCIe 4.0, that's 16 giga-transfers/second (GT/s), with double that amount for PCIe 5.0 (32 GT/s). Data rates are naturally slower because of encoding and packaging overheads, but the rated bandwidth for these two PCIe versions is 64 Gbps (4.0) and 128 Gbps (5.0), respectively. That's fast! Remote DMA (RDMA) extends the capability of DMA within a single computer to work between a pair of devices across a network connection. RDMA is typically based on a unique application programming interface (API) that works with specialised networking hardware and software to provide as many of the same benefits of local DMA as underlying network technology allows. There are three prevalent types of RDMA technologies: NVIDIA NVLink uses the highest-speed proprietary interfaces and switch technologies to speed data transfers between GPUs on a high-speed network. It currently clocks the highest performance on standard MLPerf Training v3.0 benchmarks for any technology. A single NVIDIA H100 Tensor Core GPU supports up to 18 NVLink connections for up to 900 Gbps aggregate (7 times the effective speed of PCIe 5.0). For applications that require significant GPU to GPU communication, a server based on either NVIDIA NVLink or an OAM baseboard. In these setups each GPU can communicate with other GPUs without having to use the slower PCI lanes back to the CPU. Generative AI is the primary use of this type 16 STORAGE Jan/Feb 2024 @STMagAndAwards www.storagemagazine.co.uk MAGAZINE
TECHNOLOGY: RDMA "The secret to boosting throughput is reduced latency and better storage bandwidth. These translate directly into improved productivity and capability, primarily through clever IO and networking techniques that rely on direct and remote memory access. Faster model training and job completion mean AI-powered applications can be deployed more quickly, and get things done faster, speeding time to value." access across all three preceding networking technologies. Each offers a different priceperformance trade-off, where more cost translates into greater speed and lower latency. Organisations can choose the underlying connection type that best fits their budgets and needs, understanding that each option represents a specific combination of price and performance upon which they can rely. As various AI- or ML-based (and other data- and compute-intensive applications) run on such a server, they can exploit the tiered architecture of GPU storage. of interconnect between GPUs. InfiniBand (IB) is a high-speed networking standard overseen by the InfiniBand Trade Association (IBTA) widely implemented on highperformance networks. Its highest specified data rates run up to 1,200 Gb/s (with 12 links) for the NDR specification as of 2022. InfiniBand is a primary network interconnect for applications where server to server communication is essential. Many HPC codes,which are distributed across many machines need the high performance bandwidth and low latency of IB. In the AI realm, IB would be used more from the training phase. Ethernet is a standard networking technology with many variants, including seldom-used TbE (~125 GBps) and more common 400 GbE (50 GBps). It has the advantages of being more affordable, widely deployed, and familiar technology in many data centres. Ethernet is an option for data centres where a variety of servers are used. For example, large amounts of data for AI on GPU specific servers is needed, but the storage servers networking is ethernet. Ethernet is more universally available from multiple vendors than InfiniBand and the performance of ethernet is growing. Ethernet can be used in many environments, including standard office environments where AI inferencing is being used, and connection to laptops and client devices is important. RDMA technologies can support GPU data Because AI and ML applications need both low latency and high bandwidth, RDMA helps extend the local advantages of DMA to network resources (subject to the underlying connections involved). This feature enables speedy access to external data via memory-to-memory transfers across devices (GPU on one end, storage device on the other). Working with NVLink, InfiniBand, or some high-speed Ethernet variant, the remote adapter transfers data from memory in a remote system to memory on some local GPU. The real advantage of using GPU servers for AI, ML, and other high-demand workloads (e.g., 3D or finite element analysis, simulations, and so forth) is that they enable the separation of infrastructure components from application loads. This saves 20% to 30% of CPU cycles currently devoted to infrastructure access and management, frees up resources and speeds access by pushing IO functions into hardware. More info: www.supermicro.com www.storagemagazine.co.uk @STMagAndAwards Jan/Feb 2024 STORAGE MAGAZINE 17
Page 1 and 2: STORAGE MAGAZINE The UK’s number
Page 3 and 4: The UK’s number one in IT Storage
Page 5 and 6: DON’T SaaSSS GET YOUR KICKED! ! T
Page 7 and 8: STRATEGY: STRATEGY: DATA MANAGEMENT
Page 9 and 10: CASE CASE STUDY: BRITVIC "Moving to
Page 11 and 12: FEATURE: FEATURE: 2024 PREDICTIONS
Page 13 and 14: FEATURE: FEATURE: 2024 PREDICTIONS
Page 15: REGISTER FOR YOUR FREE TICKET WWW.D
Page 19 and 20: Hybrid storage architecture, optimi
Page 21 and 22: CASE STUDY: CASE STUDY: CANCER RESE
Page 23 and 24: The future is here. Tiered Backup S
Page 25 and 26: ROUNDTABLE: SOFTWARE "Convenience a
Page 27 and 28: ROUNDTABLE: SOFTWARE "For vendors,
Page 29 and 30: INDUSTRY FOCUS: MEDIA FOCUS: & ENTE
Page 31 and 32: Case Study Object Archive and Tape
Page 33 and 34: MANAGEMENT: RANSOMWARE "Backups are

ST2401

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?