BUILDING YOUR FIRST NEURAL NETWORK
INTRODUCTION
The modern era is data driven. Data and information are being collected and stored more than we can process. Dealing with such a copious amount of data requires speed, accuracy, and utmost efficacy. The conventional method of dealing with such an abundance of information is highly inefficient and inaccurate. This issue engendered an approach that is robust, diligent, accurate, and highly reliable. Deep learning plays a crucial role in today’s fast-paced technological era. It is versatile and is being utilized in almost every field. The applications of deep learning have proliferated in the past decade. From image recognition to financial management, it has proven to be an essential tool for modern technology. It has ameliorated the quotidian life of a common man.
​
In this article, we dive into some key concepts of how a neural network functions, and shall learn how to develop and code a simple neural network using Python. A neural network is essentially a mathematical function that formulates an output, based on a given input. It mimics the functionality of the human brain. Our network primarily performs tasks and produces accurate predictions by analyzing previously acquired training data. Before we tear into the code, let us get acquainted to some terminologies and key concepts.
NETWORK ARCHITECTURE

A neural network typically consists of:
​
1) AN INPUT LAYER: Initial dataset which is fed to our model.
​
2) HIDDEN LAYERS: An arbitrary number of layers between the input and output layers.
​
3) OUTPUT LAYER: Computes a value which essentially is the prediction of a given input.
A SINGLE NEURON
It is the basic unit of a neural network. It is often referred to as a node. It is connected to nodes from the previous layer, and is associated with a weight (w), bias (b) and an input. A typical node performs a mathematical operation on the weighted sum of its input and formulates an output that is passed onto the next layer.

ACTIVATION FUNCTION

The aforementioned bias (b) is used to introduce a constant to every node, which plays a role in shifting the activation function to the right or left.
​
Let us dive into the code and understand how the model predicts a value, given an input. Consider a simple network that consists of one hidden layer (3 nodes) and one output later (one node).

TRAINING
The training process involves the following steps:
​
​
1.Initialize weights: All weights and biases in the network are assigned random weights with the help of Numpy. Note that it is a good practice to be aware of the dimensions of input, weights, biases, and the output, as this will be helpful in debugging math errors that emerge out of erroneous matrix multiplication.
np.random.seed(3)
def initialize_weights():
parameters = {}
parameters['W1'] = 2*np.random.randn(3,4) - 1 # 3 x 4 parameters['b1'] = np.zeros((3,1)) # b -> 3 x 1 parameters['W2'] = 2*np.random.randn(1, 3) - 1 # 1 x 3 parameters['b2'] = np.zeros((1, 1)) # b -> 1 x 1
return parameters
parameters = initialize_weights()
print("Shape of W1:" ,parameters['W1'].shape) print("Shape of W2:",parameters['W2'].shape) print("Shape of b1:",parameters['b1'].shape)
print("Shape of b1:" ,parameters['b2'].shape)
print("Shape of x:" , x.shape)
print("Shape of y:" , y.shape)
print("Shape of y:" , y.shape) print("WEIGHTS AND BIASES BEFORE TRAINING:")
print(parameters)
OUTPUT
Shape of W1: (3, 4)
Shape of W2: (1, 3)
Shape of b1: (3, 1)
Shape of b1: (1, 1)
Shape of x: (4, 4)
Shape of y: (4,)
WEIGHTS AND BIASES BEFORE TRAINING: {'W1': array([[ 2.57725695, -0.1269803 , -0.80700506, -4.72698541], [-1.55477641, -1.70951796, -1.16548296, -2.25400135], [-1.08763634, -1.95443606, -3.62772951, 0.76924476]]), 'b1': array([[0.], [0.], [0.]]), 'W2': array([[ 0.76263608, 2.41914613, -0.89993272]]), 'b2': array([[0.]])}
2.Forward propagation: Ever node calculated the weighted sum of its input and formulates an output which is passed through an activation function.
​
Z1 = W1*X1 + W2*X2 + W3*X3 + b1
​
a1 = g(W1*X1 + W2*X2 + W3*X3 + b1) = g(Z1)
​
g(z) is an activation function. The above mathematical formulation can be generalized to a vectorized form – See this link to learn more about vectorised implementation
def relu(x, diff=False): #ACTIVATION FUNCTION - if diff = True, function returns the derivative of Relu
if diff == True:
return (x > 0)
return (x > 0)*x
​
def forward_propagation(parameters):
layer_1 = relu(np.dot(parameters['W1'], x) + parameters['b1'], diff=False)
layer_2 = np.dot(parameters['W2'], layer_1) + parameters['b2']
return layer_1,layer_2
3. BACK PROPAGATION:
​
COST FUNCTION- It is the difference between the predicted value('h' or 'layer_2') and the actual value('y'). It is a measure our how wrong our model is when juxtaposed with the actual ‘y’ value. Learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function

Backpropagation is about understanding how changing the weights and biases in a network changes the cost function. Ultimately, this means computing the partial derivatives dw and db. But to compute those, we first introduce an intermediate quantity, delta_1 and delta_2, which we call the error in the lth layer. Backpropagation will give us a procedure to compute the error delta, and then will relate delta to dw and db. For detailed info about back propagation I highly recommend you go through this article We repeat this process with all other training examples in our dataset. Then, our network is said to have learnt those examples.
def back_propagation(learning_rate, x, y, layer_1, layer_2):
delta_layer_2 = (layer_2 - y) # 1x4
delta_layer_1 = np.dot(parameters['W2'].T, delta_layer_2) * relu(np.dot(parameters['W1'], x) + parameters['b1'], diff=True) # 3x4
db_2 = (1 / 4) * np.sum(delta_layer_2, axis=1, keepdims=True)
db_1 = (1 / 4) * np.sum(delta_layer_1, axis=1, keepdims=True)
dw_2 = np.dot(delta_layer_2, layer_1.T)
dw_1 = np.dot(delta_layer_1, x.T)
parameters['W1'] -= learning_rate * dw_1
parameters['W2'] -= learning_rate * dw_2
parameters['b1'] -= learning_rate * db_1
parameters['b2'] -= learning_rate * db_2
return parameters
OUTPUT
Shape of W1: (3, 4)
Shape of W2: (1, 3)
Shape of b1: (3, 1)
Shape of b1: (1, 1)
Shape of x: (4, 4)
Shape of y: (4,)
WEIGHTS AND BIASES BEFORE TRAINING:
{'W1': array([[ 2.57725695, -0.1269803 , -0.80700506, -4.72698541], [-1.55477641, -1.70951796, -1.16548296, -2.25400135], [-1.08763634, -1.95443606, -3.62772951, 0.76924476]]), 'b1': array([[0.], [0.], [0.]]), 'W2': array([[ 0.76263608, 2.41914613, -0.89993272]]), 'b2': array([[0.]])} WEIGHTS AND BIASES AFTER TRAINING: {'W1': array([[ 2.51457492, -0.1269803 , -0.86968709, -4.72698541], [-1.55477641, -1.70951796, -1.16548296, -2.25400135], [-1.08763634, -1.95443606, -3.62772951, 0.76924476]]), 'b1': array([[-0.01567051], [ 0. ], [ 0. ]]), 'W2': array([[-0.61379166, 2.41914613, -0.89993272]]), 'b2': array([[1.]])}
prediction : [[1.11022302e-15 1.00000000e+00 1.00000000e+00 1.00000000e+00]]
Actual y: [0 1 1 1]
As you can see the predicted value is very accurate. Using algorithms, neural networks can recognize hidden patterns and correlations in raw data, cluster and classify it, and – over time – continuously learn and improve. This article helps us build an intuition for the network. Complex mathematical formulations and derivations are avoided. For additional information, I highly recommend going through the references.