asynchronous-data-parallel

SlowMo

SlowMo: Distributed Optimization for Faster Learning SlowMo, short for Slow Momentum, is a distributed optimization method designed to help machines learn faster. It does this by periodically synchronizing workers and performing a momentum update using ALLREDUCE after several iterations of an initial optimization algorithm. This allows for better coordination among machines during the learning process, resulting in more accurate and faster results. How SlowMo Works SlowMo is built upon exist

Wavelet Distributed Training

What is Wavelet Distributed Training? Wavelet distributed training is an approach to neural network training that uses an asynchronous data parallel technique to divide the training tasks into two waves. The tick-wave and tock-wave run on the same group of GPUs and are interleaved so that each wave can leverage the on-device memory of the other wave during their memory valley period. How does Wavelet work? Wavelet divides dataparallel training tasks into two waves, tick-wave and tock-wave. T

1 / 1