%matplotlib inline
๐ Pytorch Quickstart, a deep learning framework for Python ๐ค#
An Engineering Approach to Deep Learning#
As an engineer we are often expected to use tools without fully understanding them. This is unfortunate, but it is a reality. The alternative is being a mathematician who spends their time exploring minutiae of the theory without doing anything of practical imporance towards a task.
This section runs through the API for common tasks in machine learning. Refer to the links in each section to dive deeper.
Adapted from Pytorch Quickstart
Note
Neural networks generally require a GPU to train. Our server does not have a GPU because they are expensive to run. You can get access to free GPU resource to run this Notebook:
Working with data#
PyTorch has two primitives to work with data:
torch.utils.data.DataLoader
and torch.utils.data.Dataset
.
Dataset
stores the samples and their corresponding labels, and DataLoader
wraps an iterable around
the Dataset
.
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
from matplotlib import pyplot as plt
import numpy as np
PyTorch offers domain-specific libraries such as TorchText, TorchVision, and TorchAudio, all of which include datasets. For this tutorial, we will be using a TorchVision dataset.
The torchvision.datasets
module contains Dataset
objects for many real-world vision data like
CIFAR, COCO (full list here). In this tutorial, we
use the FashionMNIST dataset. Every TorchVision Dataset
includes two arguments: transform
and
target_transform
to modify the samples and labels respectively.
# Download training data from open datasets
training_data = datasets.FashionMNIST(
root="data",
train=True,
download=True,
transform=ToTensor(), # Converts images to PyTorch tensors
)
# Download test data from open datasets
test_data = datasets.FashionMNIST(
root="data",
train=False,
download=True,
transform=ToTensor(), # Converts images to PyTorch tensors
)
0%| | 0.00/26.4M [00:00<?, ?B/s]
0%| | 65.5k/26.4M [00:00<01:08, 382kB/s]
1%| | 164k/26.4M [00:00<00:50, 521kB/s]
2%|โ | 655k/26.4M [00:00<00:15, 1.66MB/s]
10%|โ | 2.65M/26.4M [00:00<00:04, 5.83MB/s]
30%|โโโ | 8.00M/26.4M [00:00<00:01, 18.0MB/s]
43%|โโโโโ | 11.3M/26.4M [00:00<00:00, 19.7MB/s]
64%|โโโโโโโ | 16.8M/26.4M [00:01<00:00, 28.8MB/s]
77%|โโโโโโโโ | 20.2M/26.4M [00:01<00:00, 26.9MB/s]
96%|โโโโโโโโโโ| 25.5M/26.4M [00:01<00:00, 33.4MB/s]
100%|โโโโโโโโโโ| 26.4M/26.4M [00:01<00:00, 20.2MB/s]
0%| | 0.00/29.5k [00:00<?, ?B/s]
100%|โโโโโโโโโโ| 29.5k/29.5k [00:00<00:00, 356kB/s]
0%| | 0.00/4.42M [00:00<?, ?B/s]
1%|โ | 65.5k/4.42M [00:00<00:11, 379kB/s]
5%|โ | 229k/4.42M [00:00<00:05, 732kB/s]
10%|โ | 426k/4.42M [00:00<00:04, 942kB/s]
14%|โโ | 623k/4.42M [00:00<00:03, 1.04MB/s]
19%|โโ | 852k/4.42M [00:00<00:03, 1.17MB/s]
24%|โโโ | 1.08M/4.42M [00:00<00:02, 1.25MB/s]
30%|โโโ | 1.31M/4.42M [00:01<00:02, 1.30MB/s]
36%|โโโโ | 1.57M/4.42M [00:01<00:02, 1.39MB/s]
41%|โโโโโ | 1.84M/4.42M [00:01<00:01, 1.45MB/s]
47%|โโโโโ | 2.06M/4.42M [00:01<00:01, 1.44MB/s]
53%|โโโโโโ | 2.33M/4.42M [00:01<00:01, 1.49MB/s]
59%|โโโโโโ | 2.59M/4.42M [00:01<00:01, 1.52MB/s]
65%|โโโโโโโ | 2.88M/4.42M [00:02<00:00, 1.60MB/s]
71%|โโโโโโโ | 3.15M/4.42M [00:02<00:00, 1.60MB/s]
77%|โโโโโโโโ | 3.41M/4.42M [00:02<00:00, 1.61MB/s]
84%|โโโโโโโโโ | 3.70M/4.42M [00:02<00:00, 1.66MB/s]
90%|โโโโโโโโโ | 3.96M/4.42M [00:02<00:00, 1.65MB/s]
96%|โโโโโโโโโโ| 4.26M/4.42M [00:02<00:00, 1.69MB/s]
100%|โโโโโโโโโโ| 4.42M/4.42M [00:03<00:00, 1.45MB/s]
0%| | 0.00/5.15k [00:00<?, ?B/s]
100%|โโโโโโโโโโ| 5.15k/5.15k [00:00<00:00, 14.5MB/s]
We pass the Dataset
as an argument to DataLoader
. This wraps an iterable over our dataset, and supports
automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e. each element
in the dataloader iterable will return a batch of 64 features and labels.
batch_size = 64
# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)
for X, y in test_dataloader:
print(f"Shape of X [N, C, H, W]: {X.shape}")
print(f"Shape of y: {y.shape} {y.dtype}")
break
Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64
Read more about loading data in PyTorch.
Creating Models#
To define a neural network in PyTorch, we create a class that inherits
from nn.Module. We define the layers of the network
in the __init__
function and specify how data will pass through the network in the forward
function. To accelerate
operations in the neural network, we move it to the GPU if available.
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
# Define model
class NeuralNetwork(nn.Module):
def __init__(self):
super().__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28 * 28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10),
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device)
print(model)
Using cuda device
NeuralNetwork(
(flatten): Flatten(start_dim=1, end_dim=-1)
(linear_relu_stack): Sequential(
(0): Linear(in_features=784, out_features=512, bias=True)
(1): ReLU()
(2): Linear(in_features=512, out_features=512, bias=True)
(3): ReLU()
(4): Linear(in_features=512, out_features=10, bias=True)
)
)
Read more about building neural networks in PyTorch.
Optimizing the Model Parameters#
To train a model, we need a loss function and an optimizer.
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and backpropagates the prediction error to adjust the modelโs parameters.
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
# Compute prediction error
pred = model(X)
loss = loss_fn(pred, y)
# Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
if batch % 100 == 0:
loss, current = loss.item(), batch * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
We also check the modelโs performance against the test dataset to ensure it is learning.
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
print(
f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n"
)
The training process is conducted over several iterations (epochs). During each epoch, the model learns parameters to make better predictions. We print the modelโs accuracy and loss at each epoch; weโd like to see the accuracy increase and the loss decrease with every epoch.
epochs = 10
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
test(test_dataloader, model, loss_fn)
print("Done!")
Epoch 1
-------------------------------
loss: 2.317696 [ 0/60000]
loss: 2.299124 [ 6400/60000]
loss: 2.273375 [12800/60000]
loss: 2.265383 [19200/60000]
loss: 2.266360 [25600/60000]
loss: 2.226820 [32000/60000]
loss: 2.247717 [38400/60000]
loss: 2.210102 [44800/60000]
loss: 2.212376 [51200/60000]
loss: 2.179549 [57600/60000]
Test Error:
Accuracy: 38.0%, Avg loss: 2.170413
Epoch 2
-------------------------------
loss: 2.191431 [ 0/60000]
loss: 2.176540 [ 6400/60000]
loss: 2.116999 [12800/60000]
loss: 2.135370 [19200/60000]
loss: 2.102609 [25600/60000]
loss: 2.038072 [32000/60000]
loss: 2.079019 [38400/60000]
loss: 1.997487 [44800/60000]
loss: 2.010596 [51200/60000]
loss: 1.947598 [57600/60000]
Test Error:
Accuracy: 52.8%, Avg loss: 1.932710
Epoch 3
-------------------------------
loss: 1.975401 [ 0/60000]
loss: 1.942916 [ 6400/60000]
loss: 1.825535 [12800/60000]
loss: 1.866862 [19200/60000]
loss: 1.773437 [25600/60000]
loss: 1.717970 [32000/60000]
loss: 1.751739 [38400/60000]
loss: 1.637627 [44800/60000]
loss: 1.669583 [51200/60000]
loss: 1.571886 [57600/60000]
Test Error:
Accuracy: 57.8%, Avg loss: 1.570542
Epoch 4
-------------------------------
loss: 1.646978 [ 0/60000]
loss: 1.604091 [ 6400/60000]
loss: 1.444854 [12800/60000]
loss: 1.518618 [19200/60000]
loss: 1.404167 [25600/60000]
loss: 1.394132 [32000/60000]
loss: 1.417887 [38400/60000]
loss: 1.322099 [44800/60000]
loss: 1.362947 [51200/60000]
loss: 1.267673 [57600/60000]
Test Error:
Accuracy: 62.4%, Avg loss: 1.283587
Epoch 5
-------------------------------
loss: 1.368386 [ 0/60000]
loss: 1.345165 [ 6400/60000]
loss: 1.170081 [12800/60000]
loss: 1.274748 [19200/60000]
loss: 1.156208 [25600/60000]
loss: 1.180254 [32000/60000]
loss: 1.205041 [38400/60000]
loss: 1.125857 [44800/60000]
loss: 1.168877 [51200/60000]
loss: 1.089566 [57600/60000]
Test Error:
Accuracy: 64.3%, Avg loss: 1.104529
Epoch 6
-------------------------------
loss: 1.180807 [ 0/60000]
loss: 1.179890 [ 6400/60000]
loss: 0.988538 [12800/60000]
loss: 1.122713 [19200/60000]
loss: 1.000436 [25600/60000]
loss: 1.037201 [32000/60000]
loss: 1.073299 [38400/60000]
loss: 1.002217 [44800/60000]
loss: 1.045425 [51200/60000]
loss: 0.979041 [57600/60000]
Test Error:
Accuracy: 65.6%, Avg loss: 0.989594
Epoch 7
-------------------------------
loss: 1.052103 [ 0/60000]
loss: 1.073342 [ 6400/60000]
loss: 0.865466 [12800/60000]
loss: 1.021954 [19200/60000]
loss: 0.901314 [25600/60000]
loss: 0.937179 [32000/60000]
loss: 0.987103 [38400/60000]
loss: 0.923491 [44800/60000]
loss: 0.962125 [51200/60000]
loss: 0.906587 [57600/60000]
Test Error:
Accuracy: 66.9%, Avg loss: 0.912618
Epoch 8
-------------------------------
loss: 0.959607 [ 0/60000]
loss: 1.001027 [ 6400/60000]
loss: 0.779598 [12800/60000]
loss: 0.952166 [19200/60000]
loss: 0.835649 [25600/60000]
loss: 0.864933 [32000/60000]
loss: 0.927267 [38400/60000]
loss: 0.872270 [44800/60000]
loss: 0.903630 [51200/60000]
loss: 0.855836 [57600/60000]
Test Error:
Accuracy: 67.9%, Avg loss: 0.858528
Epoch 9
-------------------------------
loss: 0.890215 [ 0/60000]
loss: 0.948495 [ 6400/60000]
loss: 0.717365 [12800/60000]
loss: 0.901592 [19200/60000]
loss: 0.789795 [25600/60000]
loss: 0.811432 [32000/60000]
loss: 0.882589 [38400/60000]
loss: 0.837747 [44800/60000]
loss: 0.860831 [51200/60000]
loss: 0.817879 [57600/60000]
Test Error:
Accuracy: 69.2%, Avg loss: 0.818452
Epoch 10
-------------------------------
loss: 0.836068 [ 0/60000]
loss: 0.907626 [ 6400/60000]
loss: 0.670136 [12800/60000]
loss: 0.863607 [19200/60000]
loss: 0.755561 [25600/60000]
loss: 0.770661 [32000/60000]
loss: 0.846874 [38400/60000]
loss: 0.812884 [44800/60000]
loss: 0.827929 [51200/60000]
loss: 0.787823 [57600/60000]
Test Error:
Accuracy: 70.5%, Avg loss: 0.787038
Done!
Read more about Training your model.
Saving Models#
A common way to save a model is to serialize the internal state dictionary (containing the model parameters).
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")
Saved PyTorch Model State to model.pth
Loading Models#
The process for loading a model includes re-creating the model structure and loading the state dictionary into it.
model = NeuralNetwork()
model.load_state_dict(torch.load("model.pth"))
<All keys matched successfully>
This model can now be used to make predictions.
classes = [
"T-shirt/top",
"Trouser",
"Pullover",
"Dress",
"Coat",
"Sandal",
"Shirt",
"Sneaker",
"Bag",
"Ankle boot",
]
index = np.random.randint(0, 10000)
model.eval()
x, y = test_data[index][0], test_data[index][1]
with torch.no_grad():
pred = model(x)
predicted, actual = classes[pred[0].argmax(0)], classes[y]
print(f"Index: {index}")
print(f'Predicted: "{predicted}", Actual: "{actual}"')
plt.imshow(x[0], cmap="gray")
Index: 9217
Predicted: "T-shirt/top", Actual: "T-shirt/top"

Read more about Saving & Loading your model.