BytePS

What is BytePS? BytePS is a method used for training deep neural networks. It is a distributed approach that can be used with varying numbers of CPU machines. BytePS can handle traditional all-reduce and parameter server (PS) as two special cases within its framework. How does BytePS work? BytePS makes use of a Summation Service and splits a DNN optimizer into two parts: gradient summation and parameter update. For faster DNN training, the CPU-friendly part, gradient summation, is kept on CP

Herring

What is Herring? Herring is a distributed training method that utilizes a parameter server. It combines Amazon Web Services' Elastic Fabric Adapter (EFA) with a unique parameter sharding technique that makes better use of the available network bandwidth. Herring utilizes a balanced fusion buffer and EFA to optimally utilize the total bandwidth available across all nodes in the cluster while reducing gradients hierarchically, reducing them inside the node first, and then across nodes. How Does

HetPipe

Introduction to HetPipe HetPipe is a revolutionary parallel method that combines two different approaches, pipelined model parallelism and data parallelism, for improved performance. This innovative solution allows multiple virtual workers, each with multiple GPUs, to process minibatches in a pipelined manner, while simultaneously leveraging data parallelism for superior performance. This article will dive deeper into the concept of HetPipe, its underlying principles, and how it could change th

Parallax

What is Parallax? Parallax is a method used to train large neural networks. It is a hybrid parallel framework that optimizes data parallel training with the use of sparsity. By combining both the Parameter Server and AllReduce architectures, Parallax improves the amount of data transferred and maximizes parallelism while minimizing computation and communication overhead. How does Parallax work? Parallax combines the Parameter Server and AllReduce architectures for handling sparse and dense v

1 / 1