Intro to Deep Learning: Neural Network

This article covers the basics of deep learning. You get an idea of what a neural network is.

The field of deep learning has seen some impressive advancement in recent times. Complex problems such as image classification, speech recognition, and natural language processing that seemed unachievable a few years back can be easily solved today using deep learning techniques. The origin of deep learning can be traced back to the 1950s with the invention of Perceptron by Frank Rosenblatt based on McCulloh and Pitts. However, it completely died off in the 1960s. The recent resurgence in deep learning is due to the right combination of easier storage of data, GPUs (computing power), and improved techniques and algorithms to use deep learning effectively. In this article, we will briefly explain about deep learning and neural networks.

Deep Learning is a subset of artificial intelligence which was developed in an effort to create a system that mimick human brain.

Intuition

Let’s first understand how the human brain works. The human brain is complicated. Scientists are dedicating massive amounts of time and energy to studying what our brains do. Still, they have only scratched the surface of how the human brain works. We humans learn through a process called Synaptic plasticity, which Canadian psychologist Donald Hebb first proposed. Synaptic plasticity is the ability of synapses (the junction between neurons) to change and adapt to new information based on how active and inactive they are. New neural connections are formed, strengthened, or weakened over time after gaining new information.

Humans learn by acquiring new information, comparing it with our current understanding, and making sense of it. Artificial Neural networks are modeled after this concept. It attempts to simulate the network of neurons that make up the brain.

Note: You should be very careful with the brain analogies. It should be noted that we are only attempting to simulate neurons. Biological neurons are of different types. The dendrites in neurons perform complex computations. The contribution of a single neuron to brain computation should not be underestimated.

The McCulloch-Pitts Neuron (1943)

In 1943, neurophysiologist Warren McCulloch and mathematician Walter Pitts wrote a paper “A Logical Calculus of Ideas Immanent in Nervous Activity,” on how neurons might work. They made the first computational model of a neuron by mimicking the functionality of a biological neuron.

The McCulloch-Pitts neuron worked by inputting either a 1 or 0 for the inputs, where 1 represented true (neuron fires) and 0 false (doesn’t fire). The combined inputs are passed through an aggregative function.

Here g aggregates the inputs and the function f takes a decision based on this aggregation.

Perceptrons

In 1957, Frank Rosenblatt, inspired by the work of Warren McCulloch and Walter Pitts, created a classical perceptron for the classification of linearly separable patterns. It was a more general computational model than McCulloch-Pitts.

Perceptron is a simple type of artificial neuron that contains a single-layer neural network that classifies data into two classes, also called a linear binary classifier.

While the McCulloch-Pitts neuron accepts either a 1 or 0 for the inputs, the Perceptron model was no longer limited to boolean values. Rosenblatt’s Perceptron also introduced weights w1, w2, w3,….wn expressing the importance of the respective inputs to the output. The neuron’s output, 0 or 1, is determined by whether the weighted sum $\sum_m w_m x_m$ is less than or greater than some threshold value. Just like the weights, the threshold is a real number, a neuron parameter. 

$\text { output }= \begin{cases}0 & \text { if } \sum_m w_m x_m \leq \text { threshold } \\ 1 & \text { if } \sum_m w_m x_m>\text { threshold }\end{cases}$

The Perceptron takes all the input values and multiplies them by their weights. Then, these multiplied values are added together to create a weighted sum. The weighted sum is then passed through a step activation function, producing the output.

perceptron

The above figure shows a perceptron. There are m inputs x1, x2, …., xm and m weights w1, w2, …., wm. Here w0 represents the bias term. The weights represent the importance of the respective inputs to the output prediction. We can represent the perceptron as an equation:

$$ \begin{array}{r} \hat{y}=f\left(w_0+\sum_{i=1}^m x_iw_i\right) \end{array} $$

The inputs xi‘s are multiplied with their corresponding weights wi‘s and added together to get the weighted sum. The bias term w0 is added to the weighted sum. The number we get after the summation is passed through an activation function. The output that the function gives us is our prediction ŷ. The activation function ensures that the output is mapped between values (0,1) or (1,-1). The bias term allows us to shift the activation function to the left or right.

Using Linear algebra we can rewrite the equation as:

$$ \hat{y}=g\left(w_0+\boldsymbol{X}^T \boldsymbol{W}\right) $$ where: $$\boldsymbol{X}=\left[\begin{array}{c}x_1 \\ \vdots \\ x_m\end{array}\right]$$ and $$\boldsymbol{W}=\left[\begin{array}{c}w_1 \\ \vdots \\ w_m\end{array}\right]$$

Perceptrons can be considered as a form of feedforward neural network, in which the connections between the nodes do not form a loop.

Minsky and Papert published a book called Perceptrons (In 1969, ten years after the discovery of the Perceptron) which pointed out key problems with Perceptrons. They pointed out that a single-layer Feed Forward Neural Network cannot solve problems in which data is not linearly separable, such as an XOR problem. So adding hidden layers could solve problems when the data is non-linearly separable.

Feed Forward Neural Network

A feed-forward neural network consists of interconnected nodes, known as neurons, organized into layers. The network includes an input layer, one or more hidden layers, and an output layer. Hidden layers were absent in Perceptrons.

They are called feedforward because information only travels forward in the network (no loops). The inputs first pass through the input nodes, then through the hidden nodes, and finally through the output nodes.

Neural network architecture

Neural Network comprises of neurons that are connected. These neurons receive inputs and produce an output. The basic function of a neural network is to take input and transform it into a meaningful output.

Neuron

The basic building block of neural networks is a “neuron”. A neuron can be perceived as a processing unit. In a neural network, the neurons are connected through “weight”s or you can say each connection between neurons has a weight associated with it. 

Weight is a numerical value that represents the strength of the connection between two neurons in a neural network.

neural network

Usually, a neural network consists of 3 layers:

  • Input layer: Input layer receives input from outside and passes it to the next layer.
  • Hidden layers: As the name suggests, the nodes of this layer are hidden. Hidden layer takes input from the input layer or other hidden layers. Some artificial neural networks have a large number of hidden layers. They process the input and pass it to the next layer.
  • Output layer: Output layer gives the output prediction. It can have one or more nodes. If it is a binary classification problem, then the output layer will have one output node. If it is a multiclass classification problem, then the output layer will consist of more than one output node.

The use of modern neural nets is called deep learning because modern networks are often deep (have many layers).

References:

Similar Posts

Leave a Reply