| |

[PyTorch]Building Feed-Forward Neural Network with MNIST Dataset

We will be implementing a multilayer feed-forward neural network that can do handwritten digit classification based on the famous MNIST dataset.

Introduction

Artificial neural networks(ANNs), as the name implies, are inspired by the biological brain, and the nervous system. ANNs are computational systems that can learn to perform tasks by looking at examples without being explicitly programmed. The input data is processed through several layers of artificial neurons. All layers can contain an arbitrary number of neurons, and each connection is represented by a weight variable.

Out of the many different kinds of neural network architecture, the simplest one is the feed-forward neural network.

What is a feed-forward neural network? 

The simplest kind of neural network is the feedforward neural network (FNN). In these networks, the data always passes forward from input layers, through hidden layers, and finally to the output layer. There are no cycles or loops in the network which means that there is no feedback in between the layers. Thus the name feed-forward.

Each neuron only receives input from the previous layer and sends output to the next layer.

Feed-Forward neural network

Each layer consists of one or more neurons/units. A unit in layer n receives input from all units in layer n-1 and sends output to all units in layer n+1. Keep in mind that a unit in layer n does not communicate with any other units in layer n.

Feed-Forward Neural Network using Pytorch on MNIST Dataset

 The goal is to create a Feed-Forward classification model on the MNIST dataset.

About Dataset

The MNIST dataset, also known as the Modified National Institute of Standards and Technology dataset, is a large collection of 28×28 grayscale images of handwritten digits. The set consists of a total of 70,000 images where the training set has 60,000 and the test set has 10,000 images. The handwritten digit images have been size-normalized and centered in a fixed size of 28×28 pixels. The digits between 0 to 9 are divided into 10 classes (one for each of the 10 digits).

The task is to classify a given image of a handwritten digit into one of 10 classes of digits. Full code in GitHub.

Importing required libraries and modules

import numpy as np

import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt

The module torch.nn contains different classes that help you easily build and train neural network models. Every module in PyTorch subclasses the nn.Module. A neural network is a module itself that consists of other modules (layers). This nested structure allows for building and managing complex architectures easily.

The torchvision package contains utility functions for working with the image data. It also contains helper classes to download and import popular datasets like MNIST.

Device Configuration

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

The torch.cuda.is_available() function checks if a GPU is available. This function returns a boolean value, which is True if a GPU is available and False otherwise. The torch.device enables you to specify the device type. The function expects a string argument specifying the device type. It sets the device variable to "cuda", indicating that we want to use the GPU. If a GPU is not available, it sets device to "cpu", indicating that we want to use the CPU.

Later when we build the model and data is used, we will use to(device) to move both the model and data to the device.

Define Hyperparameters

# Hyper-parameters
VALID_SIZE = 0.1
input_dim = 784     # 28x28
hidden_dim = 500 
output_dim = 10     # number of classes
num_epochs = 5
batch_size = 100
learning_rate = 0.001

We have defined our hyperparameters for our neural network. We set the input dimension to 784 as we know that our dataset contains images of the size 28×28 and flatten this into a one-dimensional array. The output dimension is set to 10 as it is the number of classes we are classifying from 0 to 9. Other hyperparameters can be tuned or set up according to one’s choice.

Loading MNIST dataset

PyTorch provides two data primitives: torch.utils.data.DataLoader and torch.utils.data.Dataset that allow you to use pre-loaded datasets as well as your own data. 

Dataset stores the samples and their corresponding labels, and DataLoader wraps an iterable around the Dataset to enable easy access to the samples.

# MNIST dataset 
train_dataset = torchvision.datasets.MNIST(root='./data', 
                                           train=True, 
                                           transform=transforms.ToTensor(),  
                                           download=True)

test_dataset = torchvision.datasets.MNIST(root='./data', 
                                          train=False, 
                                          transform=transforms.ToTensor())

We are splitting train dataset into train and valid dataset.

# Add Validation data
num_train = len(train_dataset)
indices = list(range(num_train)) # get indices of train
np.random.shuffle(indices)
split = int(np.floor(VALID_SIZE * num_train))
train_idx, valid_idx = indices[split:], indices[:split] # split data

# define samplers for training and validation batches
train_sampler = torch.utils.data.SubsetRandomSampler(train_idx)
valid_sampler = torch.utils.data.SubsetRandomSampler(valid_idx)

DataLoaders

The PyTorch DataLoader class is a utility class that is used to load data from a dataset and create mini-batches for training deep learning models. It is designed to handle large datasets and perform data transformation, shuffling, and other preprocessing tasks. DataLoader is an iterable that abstracts complex preprocessing for us in an easy API.

The data training pipeline should be as modular as possible to aid in quick prototyping and usability.

# Data loader
train_loader = torch.utils.data.DataLoader(
    dataset=train_dataset,
    batch_size=batch_size,
    sampler=train_sampler
)

valid_loader = torch.utils.data.DataLoader(
    dataset=train_dataset,
    batch_size=batch_size,
    sampler=valid_sampler
)

test_loader = torch.utils.data.DataLoader(
    dataset=test_dataset, 
    batch_size=batch_size,
    shuffle=False
)
examples = iter(train_loader)
example_data, example_targets = next(examples)

for i in range(6):
    plt.subplot(2,3,i+1)
    plt.imshow(example_data[i][0], cmap='gray')
plt.show()
MNIST dataset
# Fully connected neural network with one hidden layer
class FeedForwardNeuralNet(nn.Module):
        def __init__(self, input_dim, hidden_dim, output_dim):
            super(FeedForwardNeuralNet, self).__init__()
            self.input_size = input_dim
            # Linear function
            self.l1 = nn.Linear(input_dim, hidden_dim) 
            # Non-linearity
            self.tanh = nn.Tanh()
            #Linear function
            self.l2 = nn.Linear(hidden_dim, output_dim)
        
        
        def forward(self, x):
            out = self.l1(x)
            out = self.tanh(out)
            out = self.l2(out)
            # no activation and no softmax at the end
            return out
    
model = FeedForwardNeuralNet(input_dim, hidden_dim, output_dim).to(device)

We moved the model to device.

print(model)

FeedForwardNeuralNet(
(l1): Linear(in_features=784, out_features=500, bias=True)
(tanh): Tanh()

(l2): Linear(in_features=500, out_features=10, bias=True)
)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) 
# Train the model
n_total_steps = len(train_loader)
train_losses = []
valid_losses = []

for epoch in range(num_epochs):
    # Training phase
    running_loss = 0.0
    for i, (images, labels) in enumerate(train_loader):  
        # origin shape: [100, 1, 28, 28]
        # resized: [100, 784]
        images = images.reshape(-1, 28*28).to(device)
        labels = labels.to(device)
        
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

            
        if (i + 1) % 100 == 0:
            average_loss = running_loss / len(train_loader.dataset)
            print(f"Epoch [{epoch+1}/{num_epochs}], Batch [{i+1}/{n_total_steps}], Loss: {average_loss:.4f}")
            running_loss = 0.0

    train_losses.append(average_loss)
    
            
    # validation phase
    model.eval()
    valid_loss = 0.0
    with torch.no_grad():
        for images, labels in valid_loader:
            images, labels = images.reshape(-1, 28*28).to(device), labels.to(device)
            output = model(images)
            loss = criterion(output, labels)
            valid_loss += loss.item()
            
    average_valid_loss = valid_loss / len(valid_loader.dataset)
    valid_losses.append(average_valid_loss)
    print(f"Valid Epoch [{epoch+1}/{num_epochs}], Loss: {average_valid_loss:.4f}")

Note: The model and the data are moved to the same device (here it is GPU). Model and data should be on the same device, either CPU or GPU. Data on the CPU and model on the GPU, or vice-versa, will result in a Runtime error.

plt.plot(range(1, num_epochs+1), train_losses, label='Training Loss')
plt.plot(range(1, num_epochs+1), valid_losses, label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training and Validation Loss')
plt.legend()
plt.show()
Training and validation loss
# Test the model
# In test phase, we don't need to compute gradients (for memory efficiency)
with torch.no_grad():
    n_correct = 0
    n_samples = 0
    for images, labels in test_loader:
        images = images.reshape(-1, 28*28).to(device)
        labels = labels.to(device)
        outputs = model(images)
        # max returns (value ,index)
        _, predicted = torch.max(outputs.data, 1)
        n_samples += labels.size(0)
        n_correct += (predicted == labels).sum().item()

    acc = 100.0 * n_correct / n_samples
    print(f'Accuracy of the network on the 10000 test images: {acc} %')

Accuracy of the network on the 10000 test images: 97.64 %

Similar Posts

Leave a Reply