Rectified Adam, also known as RAdam, is a modification of the Adam stochastic optimizer, which aims to solve the bad convergence problem experienced by Adam. It does so by rectifying the variance of the adaptive learning rate.

The Problem with Adam

The authors of RAdam contend that the primary issue with Adam is its adaptive learning rate's undesirably high variance in the early stages of model training due to the low number of training samples. This characteristic of Adam often leads to bad convergence, and it serves as the motivation for creating RAdam.

The Solution with RAdam

RAdam solves the problem of Adam's adaptive learning rate by using smaller learning rates in the first few epochs of training. This solution justifies the warmup heuristic.

The following equations show the computation steps for adaptive learning rate, variance rectification, and parameter update:

gt = ∇θftt−1)

vt = 1/β2vt−1 + (1−β2)g2t

mt = β1mt−1 + (1−β1)gt

∆ = mt / (1−βt1)

ρt = ρ - 2tβt2/ (1 − βt2)

ρ = 2 / (1 - β2) -1

If the variance is measurable - ρt > 4 then:

lt = sqrt((1−βt2) / vt)

rt = sqrt(((ρt − 4)(ρt - 2)ρ) /((ρ − 4)(ρ − 2)ρt))

θt = θt−1 − αtrt∆lt

If the variance is not measurable, we update instead with:

θt = θt−1 − αt

Recitified Adam is a variation of the Adam stochastic optimiser. It is intended to solve the convergence problems caused by Adam's high variance adaptive learning rate in the early stages of training. This is achieved by using smaller learning rates until model convergence becomes adequate. The result is more stable convergence and more accurate models.

Great! Next, complete checkout for full access to SERP AI.
Welcome back! You've successfully signed in.
You've successfully subscribed to SERP AI.
Success! Your account is fully activated, you now have access to all content.
Success! Your billing info has been updated.
Your billing was not updated.