Data Parallel Methods

SlowMo

public – 2 min read

SlowMo: Distributed Optimization for Faster Learning SlowMo, short for Slow Momentum, is a distributed optimization method designed to help machines…

Apr 23, 2023

ByteScheduler

public – 1 min read

Distributed deep neural network training can be a complex process, especially when it comes to communication between nodes. This is…

Apr 23, 2023

BAGUA

public – 2 min read

Understanding BAGUA BAGUA is a communication framework used in machine learning that has been designed to support state-of-the-art system relaxation…

Apr 23, 2023

ZeRO-Infinity

public – 1 min read

ZeRO-Infinity is a cutting-edge technology designed to help data scientists tackle larger and more complex machine learning projects. It is…

Apr 23, 2023

Gradient Quantization with Adaptive Levels/Multiplier

public – 4 min read

Overview of ALQ and AMQ Quantization Schemes Many machine learning models operate on large amounts of data and require a…

Apr 23, 2023

ZeRO

public – 2 min read

ZeRO: A Sharded Data Parallel Method for Distributed Training What is ZeRO? ZeRO (Zero Redundancy Optimizer) is a novel method…

Apr 23, 2023

Distributed Any-Batch Mirror Descent

public – 2 min read

DABMD: An Overview of Distributed Any-Batch Mirror Descent If you've ever waited for slow internet to load a webpage, you…

Apr 23, 2023

PyTorch DDP

public – 2 min read

PyTorch DDP (Distributed Data Parallel) is a method for distributing the training of deep learning models across multiple machines. It…

Apr 23, 2023

Nonuniform Quantization for Stochastic Gradient Descent

public – 3 min read

Overview of NUQSGD In today’s age where the size and complexity of models and datasets are constantly increasing, efficient…

Apr 23, 2023

Accordion

public – 2 min read

Accordion: A Simple and Effective Communication Scheduling Algorithm If you are interested in machine learning, you might have heard about…

Apr 23, 2023

ZeRO-Offload

public – 1 min read

What is ZeRO-Offload? ZeRO-Offload is a method for distributed training where data is split between multiple GPUs and CPUs. It…

Apr 23, 2023

Wavelet Distributed Training

public – 2 min read

What is Wavelet Distributed Training? Wavelet distributed training is an approach to neural network training that uses an asynchronous data…

Apr 23, 2023

PowerSGD

public – 2 min read

Overview of PowerSGD: A Distributed Optimization Technique If you're someone who is interested in the field of machine learning, you…

Apr 23, 2023

Local SGD

public – 2 min read

Local SGD is an advanced technique used in machine learning that helps to speed up the training process by running…

Apr 23, 2023

Gradient Sparsification

public – 3 min read

Overview of Gradient Sparsification Gradient Sparsification is a technique used in distributed machine learning to reduce the communication cost between…

Apr 23, 2023