hybrid-parallel-methods

BytePS

What is BytePS? BytePS is a method used for training deep neural networks. It is a distributed approach that can be used with varying numbers of CPU machines. BytePS can handle traditional all-reduce and parameter server (PS) as two special cases within its framework. How does BytePS work? BytePS makes use of a Summation Service and splits a DNN optimizer into two parts: gradient summation and parameter update. For faster DNN training, the CPU-friendly part, gradient summation, is kept on CP

FastMoE

FastMoE is a powerful distributed training system built on PyTorch that accelerates the training process of massive models with commonly used accelerators. This system is designed to provide a hierarchical interface to ensure the flexibility of model designs and the adaptability of different applications, such as Transformer-XL and Megatron-LM. What is FastMoE? FastMoE stands for Fast Mixture of Experts, a training system that distributes training for models across multiple nodes. Its primary

Herring

What is Herring? Herring is a distributed training method that utilizes a parameter server. It combines Amazon Web Services' Elastic Fabric Adapter (EFA) with a unique parameter sharding technique that makes better use of the available network bandwidth. Herring utilizes a balanced fusion buffer and EFA to optimally utilize the total bandwidth available across all nodes in the cluster while reducing gradients hierarchically, reducing them inside the node first, and then across nodes. How Does

HetPipe

Introduction to HetPipe HetPipe is a revolutionary parallel method that combines two different approaches, pipelined model parallelism and data parallelism, for improved performance. This innovative solution allows multiple virtual workers, each with multiple GPUs, to process minibatches in a pipelined manner, while simultaneously leveraging data parallelism for superior performance. This article will dive deeper into the concept of HetPipe, its underlying principles, and how it could change th

Parallax

What is Parallax? Parallax is a method used to train large neural networks. It is a hybrid parallel framework that optimizes data parallel training with the use of sparsity. By combining both the Parameter Server and AllReduce architectures, Parallax improves the amount of data transferred and maximizes parallelism while minimizing computation and communication overhead. How does Parallax work? Parallax combines the Parameter Server and AllReduce architectures for handling sparse and dense v

PipeTransformer

What is PipeTransformer? PipeTransformer is a novel method for training artificial intelligence models, specifically Transformer models, in a distributed and efficient manner. The ultimate goal of PipeTransformer is to speed up the time it takes to train these models, which can be used for a variety of tasks, such as natural language processing and image recognition. How Does PipeTransformer Work? One of the key features of PipeTransformer is its use of an adaptive on-the-fly freeze algorith

1 / 1