sharded-data-parallel-methods

ZeRO-Infinity

ZeRO-Infinity is a cutting-edge technology designed to help data scientists tackle larger and more complex machine learning projects. It is an extension of ZeRO, a sharded data parallel system that allows for parallel training of large models across multiple GPUs. However, what sets ZeRO-Infinity apart is its innovation in heterogeneous memory access, which includes the infinity offload engine and memory-centric tiling. Infinity Offload Engine One of the biggest challenges of training large m

ZeRO-Offload

What is ZeRO-Offload? ZeRO-Offload is a method for distributed training where data is split between multiple GPUs and CPUs. It is called a sharded data parallel method because it exploits both CPU memory and compute for offloading. This efficient method offers a clear path towards efficiently scaling on multiple GPUs by working with ZeRO-powered data parallelism. How ZeRO-Offload Works ZeRO-Offload maintains a single copy of the optimizer states on the CPU memory regardless of the data paral

ZeRO

ZeRO: A Sharded Data Parallel Method for Distributed Training What is ZeRO? ZeRO (Zero Redundancy Optimizer) is a novel method for distributed deep learning training. It is designed to reduce memory consumption in distributed deep learning operations, which are crucial, especially for large-scale processing of deep neural networks. With ZeRO, researchers and practitioners can partition the model states instead of replicating them, thus reducing memory redundancy across data-parallel processes

1 / 1