distributed-methods — Page 3

ZeRO

ZeRO: A Sharded Data Parallel Method for Distributed Training What is ZeRO? ZeRO (Zero Redundancy Optimizer) is a novel method for distributed deep learning training. It is designed to reduce memory consumption in distributed deep learning operations, which are crucial, especially for large-scale processing of deep neural networks. With ZeRO, researchers and practitioners can partition the model states instead of replicating them, thus reducing memory redundancy across data-parallel processes