T-Fixup

public 3 min read
T-Fixup is an initialization method for Transformers that aims to remove the need for layer normalization and warmup. This method…