ZeRO-Infinity is a cutting-edge technology designed to help data scientists tackle larger and more complex machine learning projects. It is an extension of ZeRO, a sharded data parallel system that allows for parallel training of large models across multiple GPUs. However, what sets ZeRO-Infinity apart is its innovation in heterogeneous memory access, which includes the infinity offload engine and memory-centric tiling.
Infinity Offload Engine
One of the biggest challenges of training large m
What is ZeRO-Offload?
ZeRO-Offload is a method for distributed training where data is split between multiple GPUs and CPUs. It is called a sharded data parallel method because it exploits both CPU memory and compute for offloading. This efficient method offers a clear path towards efficiently scaling on multiple GPUs by working with ZeRO-powered data parallelism.
How ZeRO-Offload Works
ZeRO-Offload maintains a single copy of the optimizer states on the CPU memory regardless of the data paral
ZeRO: A Sharded Data Parallel Method for Distributed Training
What is ZeRO?
ZeRO (Zero Redundancy Optimizer) is a novel method for distributed deep learning training. It is designed to reduce memory consumption in distributed deep learning operations, which are crucial, especially for large-scale processing of deep neural networks. With ZeRO, researchers and practitioners can partition the model states instead of replicating them, thus reducing memory redundancy across data-parallel processes