Activation Functions

Sai Chandra Nerella
5 min readJul 6, 2021

In any Neural Network activation plays a crucial role. As it helps in making decisions easier by making the complex value to predictable one. This can be used in image classification,object detection , language transformation etc..,
without these computations are complex to handle. It helps in increasing the ability of a network to make predictions by shooting up the speed to converge and make decisions based on that output.

Activation function defines the output of input or set of inputs or in other terms defines node of the output of node that is given in inputs. They basically decide to deactivate neurons or activate them to get the desired output. It also performs a nonlinear transformation on the input to get better results on a complex neural network.
The neuron is basically is a weighted average of input, then this sum is passed through an activation function to get an output.

Y = ∑ (weights*input + bias)

Here Y can be anything for a neuron between range -infinity to +infinity. So, we have to bound our output to get the desired prediction or generalized results.

Y = Activation function(∑ (weights*input + bias))

So, we pass that neuron to activation function to bound output values within a certain range.

Why do we need Activation Functions?

Without activation function, weight and bias would only have a linear transformation, or neural network is just a linear regression model, a linear equation is polynomial of one degree only which is simple to solve but limited in terms of ability to solve complex problems or higher degree polynomials.

But opposite to that, the addition of activation function to neural network executes the non-linear transformation to input and make it capable to solve complex problems such as language translations and image classifications.

In addition to that, Activation functions are differentiable due to which they can easily implement back propagations, optimized strategy while performing back propagations to measure gradient loss functions in the neural networks.For doing these tasks there are many activations with us. So, Let’s look into types of Activation Functions.

Activation Function

Types of Activation Functions

  1. Binary Step Activation Function:
    This is a basic activation function used to bound the output between 2 classes. As it considers a threshold value and makes decisions based on that value. Threshold may vary according to our requirement.
    f(x) = 1 if x > 0 else 0 if x < 0
Here the threshold is 0. So we are able to separate them into 2 classes.

2. Linear Activation Function :
It is a simple straight line activation function where output is directly proportional to the weighted sum of the inputs. There is nothing much about it. As it is the linear combination of weights and inputs.
So, Its equation is Y = m*Z

3. ReLU ( Rectified Linear Unit) :
It is most widely used Activation Function, It outputs values ranging from 0 to infinity , All the negative values are converted to zero. Its conversion rate is also fast. But the only problem is completely neglecting the negative values.
y = max(0, x). x is the input.
Leaky ReLU function instead of ReLU to avoid this unfitting, in Leaky ReLU range is expanded which enhances the performance.

ReLU

4. Leaky ReLU:
We needed the Leaky ReLU activation function to solve the ‘Dying ReLU’ problem, as discussed in ReLU, we observe that all the negative input values turn into zero very quickly and in the case of Leaky ReLU we do not make all negative inputs to zero but to a value near to zero which solves the major issue of ReLU activation function.

Leaky ReLU

5. Sigmoid Activation Function:
It is one of the mostly used activation function as it does tasks with great efficiency. It used probabilistic approach to make the predictions between 0 and 1 and then decide the output based on the max probability. So, if a class has more probability than all other classes then it is the predicted output.
Along with advantages there is also a drawback with vanishing gradient problem which occurs because of converting large value between 0 to 1, which doesn’t give expected probabilistic value and due to this there will be mismatch in the predicted and original one. So ReLU is used in this cases.
Its Equation is f(x) = 1/(1+e(-x) )

Sigmoid Activation Function

6. Hyperbolic Tangent Activation Function :
This activation function is slightly better than the sigmoid function, like the sigmoid function it is also used to predict or to differentiate between two classes but it maps the negative input into negative quantity only and ranges in between -1 to 1.

Tanh Activation Function

7. Softmax Activation Function:
It is used in last layer of the neural network to make prediction. It works similar to sigmoid function and predicts the probabilistic value for an input based on product of weights and inputs and sum with bias.

Softmax Activation Function

Conclusion:
The activation functions are those significant functions that perform a non-linear transformation to the input and making it proficient to understand and executes more complex tasks. All the above discussed activation functions are used for same purposed but in different conditions.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Sai Chandra Nerella
Sai Chandra Nerella

Written by Sai Chandra Nerella

Being Simple makes everything perfect..

No responses yet

Write a response