Rational Activation Function

Rational Activation Function: An Introduction

Activation functions are an integral part of a deep neural network. They define how the input signal in a node should be transformed into an output signal. The most commonly used activation functions are Sigmoid, ReLU, and Tanh. Rational activation functions are a recent addition to the family of activation functions, and they are ratio of polynomials as learnable functions. Let's dive deeper into rational activation functions and understand their benefits in the field of deep learning.

Understanding Rational Functions

A rational function is a function defined as the ratio of two polynomials. Consider the following example:

f(x) = (2x^2 + 3x + 1) / (x^2 + 1)

Here, f(x) is a rational function, where the numerator and denominator are both polynomials. Rational functions have some interesting properties:

They have a finite value for each value of x.
They have vertical and horizontal asymptotes, which define the behavior of the function at the extreme values of x.
They are continuous and differentiable.

These properties make rational functions a good candidate for activation functions for deep neural networks.

Rational Activation Functions

Rational activation functions are defined as:

f(x) = P(x) / Q(x)

where P(x) and Q(x) are polynomials, and learnable parameters of the neural network. Rational activation functions have some advantages over traditional activation functions:

They have a finite value for each value of x, preventing the problem of exploding gradients.
They are smooth, which means they have a continuous derivative at all points, making them easier to optimize.
They can model complex functions, which is useful in deep learning.

A common example of rational activation function is the sigmoid function:

f(x) = 1 / (1 + exp(-x))

which can be written as:

f(x) = P(x) / Q(x) = 1 / (1 + exp(-x)) = (1 + exp(-x)) / exp(-x) = (exp(x) + 1) / (exp(x))

Here, P(x) = (exp(x) + 1) and Q(x) = exp(x). This means that the sigmoid function can be represented as a rational function. Other examples of rational activation functions include the arctan function, leaky rational activation functions, and soft rational activation functions.

Leaky Rational Activation Functions

Leaky rational activation functions are an extension of the traditional rational activation functions. They are defined as:

f(x) = x + α * P(x) / Q(x)

where P(x) and Q(x) are polynomials and α is a learnable parameter. Here, the linear term x allows the function to pass information even when the activation is small. This is useful because traditional activation functions like Sigmoid and Tanh can saturate the output, especially when the input is far from 0. Saturation means that the gradient of the function becomes very small, which makes optimization difficult. Leaky rational functions solve this problem by allowing a small linear component in the function, even when the input is small. This ensures that the gradient remains large even when the input is small. Hence, the network can learn more effectively.

Soft Rational Activation Functions

Soft rational activation functions are a subclass of rational activation functions. They are designed to provide a soft transition between linear and nonlinear regimes. Soft rational functions are useful for network architectures that require a gentle transition between the regimes.

An example of a soft rational activation function is:

f(x) = x / (1 + λ * P(x) / Q(x))

Here, λ is a learnable parameter that controls the rate of softness. As λ approaches 0, the function becomes piecewise linear, whereas, as λ approaches infinity, the function becomes the traditional rational activation function.

Rational activation functions are a recent addition to the family of activation functions in deep learning. They offer several advantages over traditional activation functions, which include a finite value for each value of x, smoothness, and ability to model complex functions. Leaky rational activation functions and soft rational activation functions are subclasses of rational activation functions that provide additional benefits to deep neural networks. With the ongoing research in deep learning, it would not be surprising to see rational activation functions being used more often in the future.