Phish: A Novel Hyper-Optimizable Activation Function

Phish: A Novel Activation Function That Could Revolutionize Deep-Learning Models

Deep-learning models have become an essential part of modern technology. They power everything from image recognition software to natural language processing algorithms. However, the success of these models depends on the right combination of various factors, one of which is the activation function used within hidden layers.

The Importance of Activation Functions

Activation functions play a critical role in the performance of deep neural-networks. They are mathematical functions that determine the output of a neuron. This output is then passed onto other neurons in the network, enabling it to make decisions based on the input data.

The primary goal of an activation function is to introduce non-linearity into the neural network. Without it, the network would become a linear regression model. In other words, the output would be a simple mathematical combination of the input values, with no complexity in the outputs produced by the model.

Rectified Linear (ReLU) Activation Function

For the past decade, the dominant activation function has been Rectified Linear (ReLU). It is a simple function, where the output is the maximum of the input value and 0. This function has proven to be effective for most neural network applications, especially in image recognition tasks.

While ReLU has been successful in many scenarios, there are some limitations. For example, it can produce dead neurons, which can occur when the input to the ReLU function is negative. Dead neurons produce 0 output, and their weights are not updated during the network's training. This can negatively impact the overall performance of the network.

Swish and Mish Activation Functions

To overcome the limitations of ReLU, researchers have developed newer activation functions. Two of the most promising ones are Swish and Mish.

Swish is a smooth function that combines the benefits of ReLU and Sigmoid functions. It is defined as f(x) = x / (1 + e^(-x)). Studies have shown that Swish outperforms ReLU in many scenarios, by producing better classification accuracies and converging faster during training.

Mish is another activation function that also outperforms ReLU. It is defined as f(x) = x * tanh(softplus(x)). Mish has shown improvements in deep network learning, especially in cases where ReLU would underperform. It handles negative values better and has more non-linearity than ReLU.

Phish: A Novel Activation Function

Phish is a novel activation function that has been proposed recently. It combines the effectiveness of ReLU, Swish, and Mish into a single function. The function is defined as f(x) = x * Tanh(GELU(x)), where Tanh is the hyperbolic tangent function, and GELU is the Gaussian Error Linear Unit function.

The main advantage of Phish is that it eliminates dead neurons and produces better classification accuracy than other activation functions. It has shown the ability to handle large datasets and reduce overfitting in deep neural networks. Moreover, Phish has a smooth, differentiable graph that allows for better optimisation during back-propagation.

Testing Phish in Neural Networks

To test the effectiveness of Phish, generalized neural networks were constructed using different activation functions, including ReLU, Swish, Mish, and Phish. SoftMax was used as the output function, and the networks were trained to minimize sparse categorical-crossentropy.

Images from MNIST and CIFAR-10 databases were used to train the networks, and a large-scale cross-validation was simulated using stochastic Markov chains to account for the law of large numbers for the probability values.

Statistical tests support the research hypothesis stating that Phish could outperform other activation functions in classification. Phish produced better classification accuracy than other activation functions, including Swish and Mish, in both MNIST and CIFAR-10 datasets.

Phish has shown great potential as a new activation function for deep neural networks. It combines the effectiveness of ReLU, Swish, and Mish, eliminating dead neurons and producing better classification accuracies than other activation functions. Future experiments could involve testing Phish in unsupervised learning algorithms and comparing it to other activation functions.