Herring

What is Herring?

Herring is a distributed training method that utilizes a parameter server. It combines Amazon Web Services' Elastic Fabric Adapter (EFA) with a unique parameter sharding technique that makes better use of the available network bandwidth. Herring utilizes a balanced fusion buffer and EFA to optimally utilize the total bandwidth available across all nodes in the cluster while reducing gradients hierarchically, reducing them inside the node first, and then across nodes.

How Does Herring Work?

Herring uses a novel parameter sharding technique that divides the model parameters across multiple parameter servers. Each parameter server is responsible for a different portion of the model parameters. Herring uses the Elastic Fabric Adapter (EFA) to enable high-speed, low-latency communication between parameter servers, which helps to reduce the communication overhead.

Each node in the cluster has a GPU that is responsible for performing the computations necessary for training the model. Herring uses a hierarchical approach for reducing gradients, reducing them inside the node first and then reducing them across nodes. This hierarchical approach enables more efficient use of PCIe bandwidth in the node and helps keep the gradient averaging-related burden on the GPU low.

Why Use Herring?

Herring is designed to improve the efficiency of distributed training, which is a common way of training machine learning models that are too large to be trained on a single machine or GPU. By utilizing a parameter server and a unique parameter sharding technique, Herring enables faster communication and more efficient use of the available network bandwidth. This can result in faster training times, which can be crucial in research or production settings.

Another advantage of Herring is that it reduces the communication overhead, which can be a significant bottleneck in distributed training. By using EFA and a balanced fusion buffer, Herring enables optimal use of the total bandwidth available across all nodes in the cluster while minimizing communication overhead.

Herring is a parameter server-based distributed training method that combines Amazon Web Services' Elastic Fabric Adapter (EFA) with a novel parameter sharding technique. By utilizing a balanced fusion buffer and EFA, Herring optimizes the use of available network bandwidth and enables faster communication between nodes in a cluster. Herring's hierarchical approach to reducing gradients and reducing the gradient averaging-related burden on the GPU can improve the efficiency of distributed training, making it a valuable tool for researchers and developers working with large machine learning models.