Asynchronous Data Parallel Data Parallel Methods Distributed Methods Optimization SlowMo public – 2 min read SlowMo: Distributed Optimization for Faster Learning SlowMo, short for Slow Momentum, is a distributed optimization method designed to help machines… Apr 23, 2023 Devin Schumacher
Data Parallel Methods Distributed Methods Replicated Data Parallel ByteScheduler public – 1 min read Distributed deep neural network training can be a complex process, especially when it comes to communication between nodes. This is… Apr 23, 2023 Devin Schumacher
Data Parallel Methods Distributed Methods Replicated Data Parallel BAGUA public – 2 min read Understanding BAGUA BAGUA is a communication framework used in machine learning that has been designed to support state-of-the-art system relaxation… Apr 23, 2023 Devin Schumacher
Data Parallel Methods Distributed Methods Sharded Data Parallel Methods ZeRO-Infinity public – 1 min read ZeRO-Infinity is a cutting-edge technology designed to help data scientists tackle larger and more complex machine learning projects. It is… Apr 23, 2023 Devin Schumacher
Data Parallel Methods Gradient Quantization with Adaptive Levels/Multiplier public – 4 min read Overview of ALQ and AMQ Quantization Schemes Many machine learning models operate on large amounts of data and require a… Apr 23, 2023 Devin Schumacher
Data Parallel Methods Distributed Methods Sharded Data Parallel Methods ZeRO public – 2 min read ZeRO: A Sharded Data Parallel Method for Distributed Training What is ZeRO? ZeRO (Zero Redundancy Optimizer) is a novel method… Apr 23, 2023 Devin Schumacher
Data Parallel Methods Distributed Methods Optimization Replicated Data Parallel Distributed Any-Batch Mirror Descent public – 2 min read DABMD: An Overview of Distributed Any-Batch Mirror Descent If you've ever waited for slow internet to load a webpage, you… Apr 23, 2023 Devin Schumacher
Data Parallel Methods Distributed Methods Replicated Data Parallel PyTorch DDP public – 2 min read PyTorch DDP (Distributed Data Parallel) is a method for distributing the training of deep learning models across multiple machines. It… Apr 23, 2023 Devin Schumacher
Data Parallel Methods Nonuniform Quantization for Stochastic Gradient Descent public – 3 min read Overview of NUQSGD In today’s age where the size and complexity of models and datasets are constantly increasing, efficient… Apr 23, 2023 Devin Schumacher
Data Parallel Methods Distributed Methods Accordion public – 2 min read Accordion: A Simple and Effective Communication Scheduling Algorithm If you are interested in machine learning, you might have heard about… Apr 23, 2023 Devin Schumacher
Data Parallel Methods Distributed Methods Sharded Data Parallel Methods ZeRO-Offload public – 1 min read What is ZeRO-Offload? ZeRO-Offload is a method for distributed training where data is split between multiple GPUs and CPUs. It… Apr 23, 2023 Devin Schumacher
Asynchronous Data Parallel Data Parallel Methods Distributed Methods Wavelet Distributed Training public – 2 min read What is Wavelet Distributed Training? Wavelet distributed training is an approach to neural network training that uses an asynchronous data… Apr 23, 2023 Devin Schumacher
Data Parallel Methods Distributed Methods Optimization Stochastic Optimization PowerSGD public – 2 min read Overview of PowerSGD: A Distributed Optimization Technique If you're someone who is interested in the field of machine learning, you… Apr 23, 2023 Devin Schumacher
Data Parallel Methods Distributed Methods Optimization Stochastic Optimization Local SGD public – 2 min read Local SGD is an advanced technique used in machine learning that helps to speed up the training process by running… Apr 23, 2023 Devin Schumacher
Data Parallel Methods Distributed Methods Optimization Stochastic Optimization Gradient Sparsification public – 3 min read Overview of Gradient Sparsification Gradient Sparsification is a technique used in distributed machine learning to reduce the communication cost between… Apr 23, 2023 Devin Schumacher