top of page

BUILDING YOUR FIRST NEURAL NETWORK

INTRODUCTION

The modern era is data driven. Data and information are being collected and stored more than we can process. Dealing with such a copious amount of data requires speed, accuracy, and utmost efficacy. The conventional method of dealing with such an abundance of information is highly inefficient and inaccurate. This issue engendered an approach that is robust, diligent, accurate, and highly reliable. Deep learning plays a crucial role in today’s fast-paced technological era. It is versatile and is being utilized in almost every field. The applications of deep learning have proliferated in the past decade. From image recognition to financial management, it has proven to be an essential tool for modern technology. It has ameliorated the quotidian life of a common man.

​

In this article, we dive into some key concepts of how a neural network functions, and shall learn how to develop and code a simple neural network using Python. A neural network is essentially a mathematical function that formulates an output, based on a given input. It mimics the functionality of the human brain. Our network primarily performs tasks and produces accurate predictions by analyzing previously acquired training data. Before we tear into the code, let us get acquainted to some terminologies and key concepts.

NETWORK ARCHITECTURE

AI_architecture.png

A neural network typically consists of:

​

1) AN INPUT LAYER: Initial dataset which is fed to our model.

​

2) HIDDEN LAYERS: An arbitrary number of layers between the input and output layers.

​

3) OUTPUT LAYER: Computes a value which essentially is the prediction of a given input.

A SINGLE NEURON

It is the basic unit of a neural network. It is often referred to as a node. It is connected to nodes from the previous layer, and is associated with a weight (w), bias (b) and an input. A typical node performs a mathematical operation on the weighted sum of its input and formulates an output that is passed onto the next layer.

node.png

ACTIVATION FUNCTION

activation.png

The aforementioned bias (b) is used to introduce a constant to every node, which plays a role in shifting the activation function to the right or left.

​

Let us dive into the code and understand how the model predicts a value, given an input. Consider a simple network that consists of one hidden layer (3 nodes) and one output later (one node).

nnet2.png

TRAINING

The training process involves the following steps:

​

​

1.Initialize weights: All weights and biases in the network are assigned random weights with the help of Numpy. Note that it is a good practice to be aware of the dimensions of input, weights, biases, and the output, as this will be helpful in debugging math errors that emerge out of erroneous matrix multiplication.

np.random.seed(3)

 

def initialize_weights():

    parameters = {}

    parameters['W1'] = 2*np.random.randn(3,4) - 1 # 3 x 4                            parameters['b1'] = np.zeros((3,1)) # b -> 3 x 1                                          parameters['W2'] = 2*np.random.randn(1, 3) - 1 # 1 x 3                          parameters['b2'] = np.zeros((1, 1)) # b -> 1 x 1

    return parameters         

 

parameters = initialize_weights()

print("Shape of W1:" ,parameters['W1'].shape) print("Shape of                                                                             W2:",parameters['W2'].shape) print("Shape of b1:",parameters['b1'].shape)

print("Shape of b1:" ,parameters['b2'].shape)

print("Shape of x:" , x.shape)

print("Shape of y:" , y.shape)

print("Shape of y:" , y.shape) print("WEIGHTS AND BIASES BEFORE TRAINING:")

print(parameters)

OUTPUT

Shape of W1: (3, 4)

Shape of W2: (1, 3)

Shape of b1: (3, 1)

Shape of b1: (1, 1)

Shape of x: (4, 4)

Shape of y: (4,)

 

WEIGHTS AND BIASES BEFORE TRAINING: {'W1': array([[ 2.57725695, -0.1269803 , -0.80700506, -4.72698541], [-1.55477641, -1.70951796, -1.16548296, -2.25400135], [-1.08763634, -1.95443606, -3.62772951, 0.76924476]]), 'b1': array([[0.], [0.], [0.]]), 'W2': array([[ 0.76263608, 2.41914613, -0.89993272]]), 'b2': array([[0.]])}

2.Forward propagation: Ever node calculated the weighted sum of its input and formulates an output which is passed through an activation function.

​

Z1 = W1*X1 + W2*X2 + W3*X3 + b1

​

a1 = g(W1*X1 + W2*X2 + W3*X3 + b1) = g(Z1)

​

g(z) is an activation function. The above mathematical formulation can be generalized to a vectorized form – See this link to learn more about vectorised implementation

def relu(x, diff=False): #ACTIVATION FUNCTION - if diff = True, function returns the derivative of Relu 

   

     if diff == True:

          return (x > 0)

          return (x > 0)*x

​

def forward_propagation(parameters):

     layer_1 = relu(np.dot(parameters['W1'], x) + parameters['b1'], diff=False)

     layer_2 = np.dot(parameters['W2'], layer_1) + parameters['b2']

     return layer_1,layer_2

3. BACK PROPAGATION:

​

COST FUNCTION- It is the difference between the predicted value('h' or 'layer_2') and the actual value('y'). It is a measure our how wrong our model is when juxtaposed with the actual ‘y’ value. Learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function

logistic_cost_function_vectorized.png

Backpropagation is about understanding how changing the weights and biases in a network changes the cost function. Ultimately, this means computing the partial derivatives dw and db. But to compute those, we first introduce an intermediate quantity, delta_1 and delta_2, which we call the error in the lth layer. Backpropagation will give us a procedure to compute the error delta, and then will relate delta to dw and db. For detailed info about back propagation I highly recommend you go through this article We repeat this process with all other training examples in our dataset. Then, our network is said to have learnt those examples.

def back_propagation(learning_rate, x, y, layer_1, layer_2):

       delta_layer_2 = (layer_2 - y) # 1x4

       delta_layer_1 = np.dot(parameters['W2'].T, delta_layer_2) * relu(np.dot(parameters['W1'], x) +                         parameters['b1'], diff=True) # 3x4

       db_2 = (1 / 4) * np.sum(delta_layer_2, axis=1, keepdims=True)

       db_1 = (1 / 4) * np.sum(delta_layer_1, axis=1, keepdims=True)

       dw_2 = np.dot(delta_layer_2, layer_1.T)

       dw_1 = np.dot(delta_layer_1, x.T)

       parameters['W1'] -= learning_rate * dw_1

       parameters['W2'] -= learning_rate * dw_2

       parameters['b1'] -= learning_rate * db_1

       parameters['b2'] -= learning_rate * db_2

 

       return parameters

OUTPUT

Shape of W1: (3, 4)

Shape of W2: (1, 3)

Shape of b1: (3, 1)

Shape of b1: (1, 1)

Shape of x: (4, 4)

Shape of y: (4,)

WEIGHTS AND BIASES BEFORE TRAINING:

 

{'W1': array([[ 2.57725695, -0.1269803 , -0.80700506, -4.72698541], [-1.55477641, -1.70951796, -1.16548296, -2.25400135], [-1.08763634, -1.95443606, -3.62772951, 0.76924476]]), 'b1': array([[0.], [0.], [0.]]), 'W2': array([[ 0.76263608, 2.41914613, -0.89993272]]), 'b2': array([[0.]])} WEIGHTS AND BIASES AFTER TRAINING: {'W1': array([[ 2.51457492, -0.1269803 , -0.86968709, -4.72698541], [-1.55477641, -1.70951796, -1.16548296, -2.25400135], [-1.08763634, -1.95443606, -3.62772951, 0.76924476]]), 'b1': array([[-0.01567051], [ 0. ], [ 0. ]]), 'W2': array([[-0.61379166, 2.41914613, -0.89993272]]), 'b2': array([[1.]])}

 

prediction : [[1.11022302e-15 1.00000000e+00 1.00000000e+00 1.00000000e+00]]

 

Actual y: [0 1 1 1]

As you can see the predicted value is very accurate. Using algorithms, neural networks can recognize hidden patterns and correlations in raw data, cluster and classify it, and – over time – continuously learn and improve. This article helps us build an intuition for the network. Complex mathematical formulations and derivations are avoided. For additional information, I highly recommend going through the references.

bottom of page