This lesson is still being designed and assembled (Pre-Alpha version)

PyTorch basics


Teaching: 0 min
Exercises: 0 min
  • Which framework choosing for deep learning?

  • What is PyTorch?

  • Be aware of the various deep learning frameworks

  • Get started with PyTorch

Deep learning frameworks

It was worth taking some time to look inside and understand underlying concepts of neural networks but as we have experienced, writing neural networks with Python using numpy or any other “low-level” user interface is a bit cumbersome, especially when hidden layers are involved.

For your research work, you will be mostly likely using high-level frameworks like Keras, TensorFlow or PyTorch. They will allow us to build very complex models quickly.

Most common high-level frameworks for deep learning


For more information see presentation from Konstantin Shmelkov, Pauline Luc, Thomas Lucas, Vicky Kalogeiton, Stéphane Lathuilière on Deep learning libraries.

Which framework to choose when ..?

To choose, one need to look at:

So why choosing pyTorch?

pytorch frontpage

In short, PyTorch is a good framework for research.

Introduction to PyTorch


This lesson is taken from Deep learning with PyTorch: a 60 minute blitz.

Tensors and variables


This short introduction is for PyTorch; however many of the notions discussed below apply to other high-level frameworks (tensorflow, etc.).

Most high-level deep learning frameworks will introduce the notion of tensors:

Tensors = building blocks (data container)

When using PyTorch, Tensors are similar to NumPy’s ndarrays, with the addition being that Tensors can also be used on a GPU to accelerate computing.

Let’s try to visualize what are tensors:

0D Tensors

0D tensor = scalar

A scalar is a single element such as a number 1, 4.7, etc.

With numpy, we could write:

import numpy
x = np.array(4.7)

with PyTorch:

import torch
x = torch.tensor(4.7)

You can also define an uninitialized tensor, or fill it with zeros or initialize it with random values:

# with empty, you need to give the dimension of the tensor
# here we define a tensor with one element
x = torch.empty(1)

# Same if initializing with zeros;
# one can also add the type of tensor: here long integers
x = torch.zeros(1, dtype=torch.long)

# uniform random numbers
x = torch.rand(1)


Random sampling creation include:

  • torch.rand()
  • torch.rand_like()
  • torch.randn()
  • torch.randn_like()
  • torch.randint()
  • torch.randint_like()
  • torch.randperm() You may also use torch.empty() See random sampling

1D tensors

1D tensor = vector

x = torch.tensor([4.7, 2.1, -1.6, 4.8, 9.0])
tensor([4.7000, 2.1000, -1.6000, 4.8000, 9.0000])

2D tensors

2D tensor = matrix

x = torch.tensor([[4.7, 2.1, -1.6, 4.8, 9.0],
		[5.7, 3.1, -2.6, 3.8, 3.0], 
		[6.7, 4.1, -3.6, 2.8, 7.0] ])
tensor([[4.7000, 2.1000, -1.6000, 4.8000, 9.0000],
	[5.7000, 3.1000, -2.6000, 3.8000, 3.0000], 
	[6.7000, 4.1000, -3.6000, 2.8000, 7.0000] ])

If we go back to our Iris dataset, we could easily load our dataset into a 2D Pytorch tensor:

import pandas as pd
df = pd.read_csv('', header=None)

# sepal length and petal length
x = torch.tensor(df.iloc[0:100, [0,2]].values)
# as for a numpy array, you can get the dimensions of x
# you can select section of your tensor:

If you select one element then you can use .item() to get a scalar (not a tensor anymore):


3D tensors

3D tensor = cube

x = torch.rand(2,3,6)

Typically, we can use 3D tensors to represent timeseries or colored images (spatial variables, color).


Image: Copernicus Sentinel-2A satellite captures northern Brazil’s Marajó island


Sentinel-2 images Earth are available with 13 spectral channels. Scientists can select which wavelength combinations to process over a given area to help better understand targets on the ground.

Read more at:

In the example, we chose 3 channels; one for each color (R,G,B).

4D tensors

4D tensor = vector of matrices

x = torch.rand(2,3,6,2)

We can use 4D tensors for instance for videos:

Source :

5D tensors

5D tensor = matrix of matrices

x = torch.rand(2,3,6,2,3)

6D tensors

6D tensor = cube of matrices

x = torch.rand(2,3,6,2,3,3)

For a more formal Introduction to Tensors for Students of Physics and Engineering.

Basic operations

See PyTorch Basic operations.

conversion to numpy

x = torch.rand(2,3)

Automatic differentiation for building and training neural networks (Autograd)

The autograd package provides automatic differentiation for all operations on Tensors.

As we have seen in our previous lesson, it is central to the backpropagation learning algorithm.

To activate autograd, you need to add requires_grad=True when defining your tensor:

x = torch.rand(2,3,requires_grad=True)

When setting .requires_grad = True, PyTorch tracks all operations on your tensor. Once all your computations are done, you can then call .backward() and PyTorch will give you all the gradients computed automatically. Use .grad attribute to get the gradient values:

x = torch.rand(2,3,requires_grad=True)
y = x*x + 5
out = y.sum()

Learn more about autograd with PyTorch documentation.

Creating a simple Neural network with PyTorch

To create a neural network with PyTorch, you have to use the Pytorch neural network class and create a new class based on torch.nn.Module which is the base class for all PyTorch neural networks.


  • Don’t be afraid by the terminology!
  • Even though the word “class” may be new to you, if you are a python programmer you most likely used classes (without even knowing it).
  • Let’s go back to our iris dataset:
    import pandas as pd
    df = pd.read_csv('', header=None)

In the example above, we read our iris dataset and stored it in a variable df. The type of df is pandas.core.frame.DataFrame. Looking at the pandas DataFrame documentation, we can get the list of attributes for a DataFrame and the list of methods we can use for a DataFrame. For instance, we can use describe to get a summary of our data:


Or you can sort your data depending on the column 1:


Here you are! Without knowing it you used python classes!

The neural network class torch.nn.Module

Neural networks can be constructed using the torch.nn package. nn depends on autograd to define models and differentiate them. An nn.Module contains layers, and a method forward(input) that returns the output.

To define a neural network with PyTorch, you would need to define a new class inherited from nn.Module.

Let’s take a very simple example i.e. we will create a perceptron with one single linear neuron and one single output. The activation function is the identity activation function.

We will define a new class called perceptron (1 single linear neuron)

import torch

class Perceptron(torch.nn.Module):
    def __init__(self):
	    # We initialize using the base 
		# class initialization
		# super() lets you avoid referring to the base class explicitly.
		# we define a layer using Linear model
		# as we will apply a linear transformation to our inputs
		# the first parameter is the number of neurons in the input layer
		# and the second parameter is the number of outputs (here 1).
        self.fl = torch.nn.Linear(1,1)
    def forward(self, x):
        x = self.fl(x)
        return x

Help: what do you mean by inherited?

Character class is a base class and Letter class and Digit class inherited from this base class.

What we have done so far is to “describe” the layout of our neural network. Now, we can define a variable of type Perception:

neural_network = Perceptron()
  (fl): Linear(in_features=1, out_features=1, bias = True)

We can also ask for information about the Perceptron parameters:

Parameter containing:
tensor([[0.3606]], requires_grad = True), Parameter containing:
tensor([-0.3389], requires_grad=True)

It means, w0 is set to -0.3389 (bias value) and w1 = 0.3606.

So if we run our neural network in forward mode, we should compute:

a1*w1 + w0

Let’s create a single input value:

input = torch.randn(1, requires_grad=True)
tensor([-1.7248], requires_grad=True)

Then we can use this input through the unlearned network:

out = neural_network(input)
tensor([-0.9608], requires_grad=True)

And it is correct because we get:

a1*w1 + w0 = 0.3606 * (-1.7248) + (-0.3389) = -0.96086288

Training the network

To train our network, we will define a cost function and use a gradient descent method similar to what we have done previously:

cost_function = torch.nn.MSELoss()
perceptron_optimizer = torch.optim.SGD(neural_network.parameters(), lr=0.01)

torch.nn.MSELoss() creates a criterion that measures the mean squared error (squared L2 norm) between each element in the modeled and targeted y. You can use other cost functions either by defining a new function yourself or by using one of the predefined.

SGD stands for Stochastic Gradient Descent and is the de-facto gradient descent method in PyTorch. However, other methods are available too.

See PyTorch documentation here.

Then we can train our neural network. For training our neural network, we need to define input data. Let’s take a very simple function y = 3*x:

inputs = [(1.,3.), (2.,6.), (3.,9.), (4.,12.), (5.,15.), (6.,18.)]
for epoch in range(100):
    for i, data in enumerate(inputs):
        X, Y = iter(data)
        X = torch.tensor([X], requires_grad=True)
		# output does not need to have requires_grad=True
        Y = torch.tensor([Y], requires_grad=False)
		# Initialize optimizer
        outputs = neural_network(X)
        cost = cost_function(outputs, Y)
        if (i % 10 == 0):
            print("Epoch {} - loss: {}".format(epoch, cost.item()))

Testing the network

import numpy as np
import matplotlib.pyplot as plt

data = torch.zeros(6,1)
for i in range(6):
    plt.plot(i+1,(i+1)*3,marker='o', color='b')
    data[i] = float(i+1.)
#predict values


  • Test your neural network with inputs values that were not used during the training.
  • Does it work?

Key Points

  • PyTorch