All in One View

Content from Introduction to Deep Learning


Last updated on 2026-06-14 | Edit this page

Estimated time: 10 minutes

Overview

Questions

  • What is deep learning and how is it used for images?
  • How can I train a simple model to classify images?

Objectives

  • Describe what deep learning is and how it can be used for image classification
  • Train a simple convolutional neural network (CNN) to classify images

Deep learning for image classification


In this lesson, we will use deep learning to classify images.

Deep learning is a type of machine learning that uses structures called neural networks. These networks learn patterns directly from data by adjusting their internal settings during training.

For image data, deep learning models can learn to recognise shapes, colours, and textures. By combining these simple patterns, they can identify more complex features such as objects in an image.

We will focus on a specific type of model called a Convolutional Neural Network (CNN). CNNs are designed for working with images and are widely used for tasks such as:

  • recognising objects in photos
  • identifying medical images
  • classifying plants, animals, or other categories

In this lesson, we will train a CNN to classify images into different categories.

What is image classification?


Image classification is one of the most common tasks in deep learning and involves assigning a label to an image.

For example, a model might look at an image and decide whether it shows a:

  • car or bicycle
  • cat or dog
  • healthy or diseased plant

In this lesson, we will train a CNN to look at images and predict the correct category for each one.

What we’ll do in this lesson


When working with programming problems, it’s useful to follow a series of steps or a workflow. Some workflows are very simple, while others — like deep learning — involve a few more stages.

In this lesson, we’ll follow a simplified version of a deep learning workflow to train and use an image classification model.

Step 1. Formulate / Outline the problem

First we must decide what we want our Deep Learning system to do. This lesson is about image classification and our aim is to put an image into one of a few categories. Specifically, in our case, we have 5 categories: [‘airplane’, ‘bird’, ‘cat’, ‘dog’, ‘truck’]

Step 2. Identify inputs and outputs

Next identify what the inputs and outputs of the neural network. In our case, the data is images and the inputs could be the individual pixels of the images. We want one output prediction for each potential image.

Step 3. Prepare data

Many datasets are not ready for immediate use in a deep learning and require some preparation. Neural networks can really only deal with numerical data, so any non-numerical data (e.g., images) have to be converted to numerical data.

For this lesson, we use an existing image dataset known as CIFAR-10 (Canadian Institute for Advanced Research).

More information on preparing data is explored in Episode 02 Introduction to Image Data but for now we’ll use a custom-defined function.

Callout

Python reminder: functions and methods

In Python, we can use functions in a few different ways:

  • Built-in functions available by default: print() or len()
  • Functions from libraries we import: tf.keras.layers.Conv2D()
  • Functions we write ourselves: def

PYTHON

# load the required packages
import tensorflow as tf # neural network 
import matplotlib.pyplot as plt # for plotting
import icwcnn_functions as icfn # pre-defined helpers

### Step 3. Prepare data

# create a list of class names associated with each CIFAR-10 label
class_names = ['airplane', 'bird', 'cat', 'dog', 'truck']

# load the data
train_ds, val_ds, test_ds = icfn.prepare_datasets()

OUTPUT

Found 1000 files belonging to 5 classes.
Found 250 files belonging to 5 classes.
Found 250 files belonging to 5 classes.

Before starting any analysis, it’s important to check that your data looks the way you expect. Let’s do that now:

Visualise a subset of the CIFAR-10 dataset

PYTHON

# set up plot region, including width, height in inches
fig, axes = plt.subplots(figsize=(5,5))

# add images to plot
for images, labels in train_ds.take(1):
    for i in range(9):
        
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(images[i].numpy().astype("uint8"))
        plt.title(class_names[labels[i]])
        plt.axis("off")
    
# view plot
plt.show()
Subset of 9 CIFAR-10 images representing different object classes
Challenge

Inspect the dataset

Looking at the images above, and knowing you will be asking a computer to label them, what kinds of questions might you ask yourself about this dataset?

Answers will vary.

  • Are the images clear and easy to interpret?
  • Do the labels seem correct?
  • Are the images all the same size?
  • Do the images look similar within each category?
  • Are there any unusual or unexpected images?

Step 4. Choose a pre-trained model or build a new architecture from scratch

Often we can use an existing neural network instead of designing one from scratch because training a network can take a lot of time and computational resources. There are a number of well publicised networks which have been demonstrated to perform well at certain tasks. If you know of one which already does a similar task well, then it makes sense to use one of these.

If instead we decide to design our own network, then there a lot of decisions that have to be made. Model selection will require iterative experimentation and tweaking before acceptable results can be achieved.

In today’s workshop we want to build an architecture for training purpurses. For now, similar to dataset preparation, we’ll use a function already prepared, create_model_intro(), and save the details for Episode 03 Build a Convolutional Neural Network.

PYTHON

# create the introduction model using pre-defined function
model_intro = create_model_intro()

Step 5. Choose a loss function and optimizer and compile model

To set up a model for training we need to compile it. This is when you set up the rules and strategies for how your network is going to learn.

The loss function tells the training algorithm how far away the predicted value was from the true value.

The optimizer takes information from the loss function and applys some changes to the weights within the network to try to do better. It is through this process that “learning” (adjustment of the weights) is achieved.

We will learn how to choose a loss function and optimizer in more detail in Episode 4 Compile and Train (Fit) a Convolutional Neural Network.

For now, let’s use options that have been proven to work well for image classfiication tasks.

PYTHON

# compile the model
model_intro.compile(optimizer = "adam",
                    loss = "sparse_categorical_crossentropy",
                    metrics =["accuracy"])

Step 6. Train the model

Now we can start training our neural network. Typically, we train the model by looping over the training data multiple times (called epochs) until performance improves or reaches a stable level.

PYTHON

# fit the model
history_intro = model_intro.fit(x = train_ds)

Your output will begin to print similar to the output below:

OUTPUT

32/32 [==============================] - 0s 5ms/step - loss: 58.7726 - accuracy: 0.2690

What does this output mean?

This output is printed during the fit phase, i.e. training the model against known image labels:

  • It took 32 steps to look at all of the training images once (called an epoch)
  • loss shows how wrong the model’s predictions are (lower is better)
  • accuracy shows how often the model is correct (higher is better)
Challenge

Is our model doing well?

Considering the loss and accuracy values from the training above:

  • What do these values tell you about how well the model is performing?
  • Is this what you would expect at this stage?
  • Can you think of any ways that might help improve these values?

Answers may vary.

  • The accuracy is quite low, so the model is not making many correct predictions yet
  • The loss is high, which suggests the model’s predictions are still far from the true labels
  • This is expected, since the model has only just started training and is very simple
  • Train for longer, use a more complex model, use more data

Step 7. Perform a Prediction/Classification

After training the network we can use it to perform predictions. This is how you would use the network after you have fully trained it to a satisfactory performance. The predictions performed here on a special hold-out set is used in the next step to measure the performance of the network. Make sure the images you use to test are prepared the same way as the training images.

To make a single prediction we need to first extract a single image and its associated label from our test dataset and then use our model to predict the class of that image.

PYTHON

# extract image and label for first image
for images, labels in test_ds.take(1):
    first_image = images[0]
    first_label = labels[0]
    
# use the model to predict class
prediction = model_intro.predict(tf.expand_dims(first_image, axis=0))
print("Predict:", prediction)

# extract class name with highest probability
predicted_label = tf.argmax(prediction[0])

print("Predicted class:", class_names[predicted_label])
print("True class:", class_names[first_label])

OUTPUT

1/1 [==============================] - 0s 11ms/step
Predict: [[3.0071956e-01 9.7787231e-20 6.9927925e-01 5.2796623e-32 1.1614857e-06]]
Predicted class: cat
True class: airplane

Congratulations, you just created your first image classification model! Notice that the model doesn’t just give one answer — it assigns a probability to each class. The class with the highest probability becomes the prediction.

Was the classification correct? Let’s plot the first test image with its true label:

PYTHON

# display image
plt.imshow(test_images[0])
plt.title('Predicted:' + class_names[predicted_label])
plt.axis("off")
plt.show() 
poor resolution image of an dog
Challenge

Interpreting the prediction

Compare the model prediction with the true class name.

  • What does this tell you about the model’s performance?
  • Why might the model have made this mistake?

Answers may vary.

  • The model made an incorrect prediction
  • This shows the model has not yet learned enough to reliably distinguish between classes
  • This is expected, since the model is simple and has only trained for a short time
  • The image itself may also be unclear or difficult to classify

Clearly, our model can be improved — we’ll look at ways to do this later.

For now, we’ve trained a model and used it to make a prediction. The next step is to see how well it performs on data it hasn’t seen before.

Step 8. Measure Performance

Once we trained the network we want to measure its performance on data that was not part of the training process, called a test dataset. Although there are many indicators of how well our network performs - called metrics - often the chosen metric(s) will depend on the type of task.

Step 9. Tune Hyperparameters

When building image classification models in Python, especially using libraries like TensorFlow or Keras, the process involves not only designing a neural network but also choosing the best values for various parameters set by the person configuring the model - these are known as hyperparameters. Searching for the best options for your dataset can enhance model performance.

Step 10. Share Model

Once we’re happy with how our model performs, we can save it and share it with others. This includes both the model structure and what it has learned, so others can use it, with or without retraining.

To share the model we must save it.

PYTHON

# save  model
model_intro.save('models/model_intro.keras')
Callout

Create the output folder if needed

If the models folder does not already exist, Python will return an error when trying to save the model.

You can create it first with:

PYTHON

import os
os.makedirs("models", exist_ok=True)

We will return to each of these workflow steps throughout this lesson and discuss each component in more detail.

Key Points
  • Deep learning uses neural networks to learn patterns directly from data.
  • Convolutional neural networks (CNNs) are commonly used for image classification.
  • Training a model involves compiling it, fitting it to data, and making predictions.
  • Model performance may be imperfect at first and can be improved with further training and tuning.

Content from Introduction to Image Data


Last updated on 2026-06-14 | Edit this page

Estimated time: 33 minutes

Overview

Questions

  • How is image data organised for this workshop?
  • How do I load image data into Python for deep learning?
  • What are training, validation, and test datasets used for?

Objectives

  • Describe how the workshop image dataset is organised into folders.
  • Load image data using keras.utils.image_dataset_from_directory().
  • Inspect images, labels, and class names in a dataset.
  • Explain the roles of training, validation, and test datasets.

In Episode 1, we used prepared data to train a simple image classification model.

In this episode, we’ll take a closer look at how image datasets are organised and loaded into Python.

Depending on your situation, you will either use a pre-existing dataset, prepare your own data for training, or some combination of the two.

In this workshop, we’ll use a prepared dataset so we can focus on the deep learning workflow but in real-world projects, you may need to collect, label, resize, and split your own images.

Before jumping into more specific tasks, it’s important to understand that images on a computer are stored as numbers. For colour images, these numbers represent red, green, and blue (RGB) values.

Images are data

Images on a computer are stored as rectangular arrays of hundreds, thousands, or millions of discrete “picture elements,” otherwise known as pixels. Each pixel can be thought of as a single square point of coloured light.

For example, consider this image of a Jabiru, with a square area designated by a red box:

Jabiru image that is 552 pixels wide and 573 pixels high. A red square around the neck region indicates the area to zoom in on.

Now, if we zoomed in close enough to the red box, the individual pixels would stand out:

zoomed in area of Jabiru where the individual pixels stand out

Note each square in the enlarged image area (i.e. each pixel) is all one colour, but each pixel can be a different colour from its neighbours. Viewed from a distance, these pixels seem to blend together to form the image.

Image data in this workshop

In this workshop, we use a small subset of the CIFAR-10 dataset containing images from five classes:

  • airplane
  • bird
  • cat
  • dog
  • truck

For learning purposes, and to make computation faster, a much smaller, random selection of images from the full dataset was used.

Most importantly, the data was split it into three distinct subsets: train, validation and test.

Callout

We use different parts of the dataset for different purposes:

  • Training set: used to teach the model
  • Validation set: used to check how the model is doing while we develop it
  • Test set: used at the end to evaluate performance on unseen data

Keeping these sets separate helps us judge how well the model is likely to perform on new images.

Dataset organisation matters

The way your data is organised matters - it affects whether and how easily it can be used for image classification tasks.

For example, to use the tf.keras.utils.image_dataset_from_directory() function packaged with Tensorflow, we use folder names to indicate which split the data belongs to and what class name to use as the label. This saves us from having to label each image individually.

How our images are organised

Our images are stored in folders; each class has its own folder and the data is split into training, validation, and test sets. For example:

cifar10_images_small/
|
├── train/
│   ├── airplane/
│   ├── bird/
│   ├── cat/
│   ├── dog/
│   └── truck/
├── val/
│   ├── airplane/
│   ├── bird/
│   ├── cat/
│   ├── dog/
│   └── truck/
└── test/
    ├── airplane/
    ├── bird/
    ├── cat/
    ├── dog/
    └── truck/
Challenge

Understanding the dataset structure

Look at the folder structure for the dataset.

  • How many classes are included?
  • How does Keras know which label belongs to each image?
  • There are five classes.
  • Keras uses the folder names as the class labels.
Challenge

Understanding the dataset contents

The dataset contains 1500 images across 5 classes and randomly split into training, validation, and test sets.

  • How many images would you expect in each split?
  • How many images per class would you expect in the training set if it was balanced?

Answers may vary. For a 60:20:20 split you might expect:

  • Training: 900 images
  • Validation: 300 images
  • Test: 300 images

If training set had 900 images, you would expect 180 images per class in a well-balanced set.

Load the datasets

In the last episode, we used the icfn.prepare_datasets() helper function in icwithcnn_functions.py to prepare our datasets:

Let us now see how to write the code inside that helper function. This is the same type of dataset we used in Episode 1 — here we’re just seeing how it is created.

We’ll start by preparing the training dataset with the understanding that the The other two splits sets will be prepared similarly.

Using the tf.keras.utils.image_dataset_from_directory() function, we specify:

  • the file path to the top level folder for the data
  • the size of the images in pixels (image_size)
  • how many images to work with at a time (batch_size)
  • whether we want to shuffle the images or load them sequentially
  • whether we want everyone in the class to have the same order of images (seed)

PYTHON

import tensorflow as tf

# create training dataeset from folder of cifar images
train_ds = tf.keras.utils.image_dataset_from_directory(
    "../data/cifar10_images_small/train",
    image_size = (32, 32),
    batch_size = 32,
    shuffle = True,
    seed = 32
)

print(train_ds)

OUTPUT

Found 1000 files belonging to 5 classes.
Train_ds: <_PrefetchDataset element_spec=(TensorSpec(shape=(None, 32, 32, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))>

The object returned by this function might look unfamiliar. That’s because, instead of loading all the images at once, it loads them in batches as needed.

How do we inspect what was loaded?

We can extract class names directly from the dataset:

PYTHON

# extract the list of class names
class_names = train_ds.class_names
print(class_names)

OUTPUT

['airplane', 'bird', 'cat', 'dog', 'truck']

We can look at a single batch to find information about the training images and labels.

PYTHON

# inspect one batch of images and labels
for images, labels in train_ds.take(1):
    print("Train images batch shape:", images.shape)
    print("Train labels batch shape:", labels.shape)

OUTPUT

Train images batch shape: (32, 32, 32, 3)
Train labels batch shape: (32,)

These numbers represent (batch size, image height, image width, colour channels):

  • There are 32 images in this batch
  • Each image is 32 pixels high and 32 pixels wide
  • Each image has 3 colour channels (RGB)
  • There is one label for each image in the batch

Finally, we can look at the image data types and pixel values:

PYTHON

 # inspect image data types and pixel values
for images, labels in train_ds.take(1):
    print("Data type: ", images.dtype)
    print("Pixel value range:", images[0].numpy().min(), images[0].numpy().max())

OUTPUT

Data type:  <dtype: 'float32'>
Pixel value range: 15.0 253.0
Callout

A note about pixel values

The images are loaded with pixel values between 0 and 255.

In most cases, we rescale these values to make training more efficient.

In this workshop, we will do this inside the model before training.

Challenge

On your own - prepare the Test dataset

Using what we’ve learned in this lesson with the training dataset, perform the same steps for the test dataset.

  1. Write the code to load the test data
  • Hint 1: Turn shuffle off
  • Hint 2: Remove the seed
  1. Inspect the test dataset and find out:
  • dimensions of the images and labels
  • class names
  • total number of images
  • number of images in each of the five classes (optional)

PYTHON

# create test dataset from folder of cifar images
test_ds = tf.keras.utils.image_dataset_from_directory(
    "../data/cifar10_images_small/test",
    image_size = (32, 32),
    batch_size = 32,
    shuffle = False,
)

# class names
print("Test class names: ", test_ds.class_names)

# dimensions of the images and labels
for images, labels in test_ds.take(1):
    print("Test images batch shape:", images.shape)
    print("Test labels batch shape:", labels.shape)

# total number of images
# 250 as noted previously

# images in each class
# ????

Now we understand how to prepare our data works we are ready to learn how to build a CNN.

Key Points
  • Images consist of pixels arranged in a particular order.
  • Image datasets can be organised into folders where each folder represents a class.
  • image_dataset_from_directory() lets us load images without writing custom data preparation code.
  • Images are loaded in batches and represented as numerical arrays.
  • Training, validation, and test sets are used to build and evaluate models.

Content from Build a Convolutional Neural Network


Last updated on 2026-06-14 | Edit this page

Estimated time: 47 minutes

Overview

Questions

  • What is a neural network?
  • How is a convolutional neural network (CNN) different from an ANN?
  • What are the types of layers used to build a CNN?

Objectives

  • Understand how a convolutional neural network (CNN) differs from an artificial neural network (ANN).
  • Explain the terms: kernel, filter.
  • Know the different layers: convolutional, pooling, flatten, dense.

In Episode 1, we used a pre-defined model to train our classifier.

In this episode, we’ll build that model ourselves, one step at a time.

We don’t need to understand all the maths behind neural networks — instead, we’ll focus on how to put the pieces together.

Neural Networks


A neural network is a series of layers that transform input data step by step. A convolutional neural network (CNN) is a type of neural network designed specifically for images.

It learns by passing image data through a series of layers, each one transforming the data slightly, until it can make a final prediction.

Step 4. Build an architecture

We build a CNN by stacking layers together.

Each layer takes some input, transforms it, and passes it to the next layer, i.e., the output from each layer becomes the input to the next layer.

There are three main components of a neural network:

  • CNN Part 1. Input Layer
  • CNN Part 2. Hidden Layers
  • CNN Part 3. Output Layer

CNN Part 1. Input Layer

We start by telling Keras the shape of our images. Recall the size of our images are 32x32 pixels with 3 colour channels (RGB).

PYTHON

# input layer 
inputs_intro = tf.keras.Input(shape=(32,32,3))

CNN Part 2. Hidden Layers

The next component consists of the so-called hidden layers of the network.

In a neural network, the input layer receives the raw data, and the output layer produces the final predictions or classifications. These layers’ contents are directly observable because you can see the input data and the network’s output predictions.

However, the hidden layers, which lie between the input and output layers, contain intermediate representations of the input data.

In a CNN, the hidden layers typically consist of convolutional, pooling, reshaping (e.g., Flatten), and dense layers.

Convolutional Layers

Convolutional layers look for simple patterns in images, such as edges or textures.

To create a convolutional layer, we need to specify:

  • the number of features to learn, filters
  • the size of the search window, kernel_size
    • smaller kernels are used to capture fine-grained features
    • odd-sized windows are common because they have a centre pixel
  • the activation function to use, activation

When building a model, each layer takes the output from the previous layer as its input.

PYTHON

# hidden layers
x_intro = tf.keras.layers.Conv2D(filters=16, 
                                 kernel_size=(3,3),
                                 activation='relu')(inputs_intro)
Challenge

Create a convolutional layer

Create a Conv2D layer with: - 16 filters - a 3x3 kernel size - ‘relu’ activation function.

Use inputs_intro as the input.

PYTHON

x_intro = tf.keras.layers.Conv2D(filters=_____, 
                                 kernel_size=_____, 
                                 activation=_____)(inputs_intro)

PYTHON

x_intro = tf.keras.layers.Conv2D(
    filters=16,
    kernel_size=(3, 3),
    activation='relu')(inputs_intro)

Pooling Layers

Pooling layers reduce the size of the image, helping the model focus on the most important features.

To create a pooling layer, we need to specify how much to reduce the image by using pool_size. A pool size of (2, 2) reduces the width and height of the image by half.

PYTHON

x_intro = tf.keras.layers.MaxPooling2D(pool_size=(2,2))(x_intro)
Callout

Deep Learning

We often repeat this two-layer pattern to learn more complex features by, e.g. increasing the number of filters:

PYTHON

x_intro = tf.keras.layers.Conv2D(16, (3,3), activation='relu')(inputs_intro)
x_intro = tf.keras.layers.MaxPooling2D((2,2))(x_intro)

x_intro = tf.keras.layers.Conv2D(32, (3, 3), activation='relu')(x_intro)
x_intro = tf.keras.layers.MaxPooling2D((2, 2))(x_intro)

Up to this point, our model has been working with 2D image data.

Next, we convert it into a format suitable for classification.

Reshaping Layers: Flatten

Flatten turns our 2D feature maps into a single list of numbers.

PYTHON

x_intro = tf.keras.layers.Flatten()(x_intro)

CNN Part 3. Output Layer

Dense layers

To make a predcition, we use a dense (fully connected) layer after reshaping. This is because Dense layers expect 1D input.

To create a dense layer, we need to specify:

  • the number of outputs to produce, units
  • the activation function, activation

For classification problems: - units is the number of classes
- softmax converts the outputs into probabilities that add up to 1

PYTHON

outputs_intro = tf.keras.layers.Dense(units=5, activation='softmax')(x_intro)

Putting it all together


Once you decide on the initial architecture of your CNN, the last step to create the model is to bring all of the parts together:

PYTHON

model_intro = tf.keras.Model(inputs = inputs_intro,
                             outputs = outputs_intro, 
                             name = "cifar_model_intro")

We now have a simple convolutional neural network!

We can put this code inside a function definition like we used in Episode 1.

Challenge

Turn your neural network into a function

Use the architecture from this episode to define a function called create_model_intro.

  • Hint 1 Your function should take two arguments, input_shape and num_classes
  • Hint 2 Your function should return a model object

PYTHON

# function that defines an introductory convolutional neural network
def create_model_intro(input_shape=(32,32,3), num_classes=5):
    
    # CNN Part 1
    # Input layer of 32x32 images with three channels (RGB)
    inputs_intro = tf.keras.Input(shape=input_shape)
    
    # CNN Part 2
    x_intro = tf.keras.layers.Conv2D(filters=16, 
                                     kernel_size=(3,3), 
                                     activation='relu')(inputs_intro)
    x_intro = tf.keras.layers.MaxPooling2D(pool_size=(2,2))(x_intro)
    x_intro = tf.keras.layers.Flatten()(x_intro)
    
    # CNN Part 3
    # Output layer with one unit for each class and softmax activation
    outputs_intro = tf.keras.layers.Dense(units=num_classes,
                                          activation='softmax')(x_intro)
    
    # create the model
    model_intro = tf.keras.Model(inputs = inputs_intro,
                                 outputs = outputs_intro, 
                                 name = "cifar_model_intro")
    
    return model_intro

Viewing the model

Once a model is created, we can look at a summary of its structure.

PYTHON

# create the introduction model
model_intro = create_model_intro()

# view model summary
model_intro.summary()

OUTPUT

Model: "cifar_model_intro"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 input_1 (InputLayer)        [(None, 32, 32, 3)]       0

 conv2d (Conv2D)             (None, 30, 30, 16)        448

 max_pooling2d (MaxPooling2  (None, 15, 15, 16)        0
 D)

 flatten (Flatten)           (None, 3600)              0

 dense (Dense)               (None, 5)                 18005

=================================================================
Total params: 18453 (72.08 KB)
Trainable params: 18453 (72.08 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

In the model summary you’ll see:

  • the layers in order
  • how the data shape changes
  • how many parameters the model learns

Don’t worry about all the details — this is just a useful way to inspect your model.

Callout

How do we choose this architecture?

You might be wondering how we decided on the number of layers and their settings.

In practice, there’s no single “correct” answer — building models often involves some trial and error.

A common approach is to:

  • start with a simple model
  • train it and see how it performs
  • gradually add layers and adjust settings to improve it

We’ll explore this idea later in the lesson.

We have a model — now what?


We now have a simple CNN that can take an image and produce a prediction.

At the moment, the model hasn’t learned anything yet — it still needs to be trained on our data.

In the next step, we’ll:

  • tell the model how to learn (choose a loss function and optimizer)
  • train it on our training data

This is where the model starts improving and learning patterns from the images.

Key Points
  • Convolutional neural networks (CNNs) are designed for working with image data.
  • CNNs are built by stacking layers, where each layer transforms the data and passes it to the next layer.
  • Convolutional layers look for simple patterns in images (e.g. edges and textures).
  • Pooling layers reduce the size of the data, helping the model focus on important features.
  • The Flatten layer converts image data into a format suitable for classification.
  • Dense layers are used to produce the final prediction.

Content from Compile and Train (Fit) a Convolutional Neural Network


Last updated on 2026-06-14 | Edit this page

Estimated time: 47 minutes

Overview

Questions

  • How do you compile a convolutional neural network (CNN)?
  • What is a loss function and an optimizer?
  • How do you train (fit) a CNN?
  • How can we check how well our model is learning during training?

Objectives

  • Compile a CNN by choosing an optimizer, loss function, and metric.
  • Train a CNN using Model.fit().
  • Explain what loss and accuracy represent during training.
  • Recognise signs of overfitting in training results.

In the previous episode, we built the structure of our convolutional neural network. Now it’s time to make it learn.

In this episode, we’ll compile the model, train it on our data, and look at how its performance changes during training.

Step 5. Choose a loss function and optimizer and compile model

Before we can train the model, we need to compile it.

Compiling sets up how the model will learn by specifying:

  • the optimizer (how the model updates its weights)
  • the loss function (how wrong the predictions are)
  • the metrics (how we measure performance)

We do this using the Model.compile() function:

Optimizer

An optimizer controls how the model updates its weights during training.

Here we’ll use one of the most common choices, 'adam', which works well for many image classification tasks.

Optimizers have settings such as the learning rate, which controls how quickly the model learns. We’ll use the default values here.

ChatGPT

Learning rate is a hyperparameter that determines the step size at which the model’s parameters are updated during training. A higher learning rate allows for more substantial parameter updates, which can lead to faster convergence, but it may risk overshooting the optimal solution. On the other hand, a lower learning rate leads to smaller updates, providing more cautious convergence, but it may take longer to reach the optimal solution. Finding an appropriate learning rate is crucial for effectively training machine learning models.

The figure below illustrates how a small learning rate will not traverse toward the minima of the gradient descent algorithm in a timely manner, i.e. number of epochs.

Plot of loss over weight value illustrating how a small learning rate takes a long time to reach the optimal solution.
Small learning rate leads to inefficient approach to loss minima

On the other hand, specifying a learning rate that is too high will result in a loss value that never approaches the minima. That is, ‘bouncing between the sides’, thus never reaching a minima to cease learning.

Plot of loss over weight value illustrating how a large learning rate never approaches the optimal solution because it bounces between the sides.
A large learning rate results in overshooting the gradient descent minima

Finally, a modest learning rate will ensure that the product of multiplying the scalar gradient value and the learning rate does not result in too small steps, nor a chaotic bounce between sides of the gradient where steepness is greatest.

Plot of loss over weight value illustrating how a good learning rate gets to optimal solution gradually.
An optimal learning rate supports a gradual approach to the minima

Loss function

The loss function measures how wrong the model’s predictions are.

During training, the model tries to reduce this value — lower loss means better predictions.

For our classification problem, we’ll use 'sparse_categorical_crossentropy', which works when each image belongs to one class.

Metrics

A metric is used to measure how well the model is performing.

For classification problems, we commonly use 'accuracy', which tells us how often the model’s predictions are correct.

Unlike the loss function, metrics are used to monitor performance — they don’t directly affect how the model learns.

Challenge

Compile the model

We’ve chosen the optimizer, loss function, and metric.

Now put them together to compile the model.

PYTHON

# compile the model
_____.compile(optimizer = _____,
			        loss = _____, 
			        metrics = _____)

OUTPUT

# compile the model
model_intro.compile(optimizer = 'adam',
                    loss = 'sparse_categorical_crossentropy',
                    metrics = ['accuracy'])

Step 6. Train (Fit) model

Now that our model is compiled, we are ready to train it.

Training is where the model learns from the data by making predictions, comparing them to the true labels, and gradually improving over time.

We do this using the Model.fit() function. It returns a history object, which stores the loss and accuracy values from training, and can be specifyied with:

  • the training data, x
  • how many times to loop through the data, epochs
  • optionally, validation_data to monitor performance during training

PYTHON

history_intro = model_intro.fit(x = train_ds,
                                epochs = 10,
                                validation_data = val_ds
)

During training, the model:

  • makes predictions
  • compares them to the true labels
  • updates its weights to improve

The Model.fit() function

Monitor Training Progress (aka Model Evaluation during Training)

After training, we can check how well the model learned by looking at the loss and accuracy over time.

We stored this information in the history_intro object returned by Model.fit(). We can convert this to a data frame and plot it:

PYTHON

import seaborn as sns
import pandas as pd

# convert the model history to a dataframe for plotting 
history_intro_df = pd.DataFrame.from_dict(history_intro.history)

# plot the loss and accuracy 
fig, axes = plt.subplots(1, 2)
fig.suptitle('cifar_model_intro')

sns.lineplot(ax=axes[0], data=history_intro_df[['loss', 'val_loss']])
sns.lineplot(ax=axes[1], data=history_intro_df[['accuracy', 'val_accuracy']])
two panel figure; the figure on the left illustrates the training loss starting at 1.5 and decreasing to 0.7 and the validation loss decreasing from 1.3 to 1.0 before leveling out; the figure on the right illustrates the training accuracy increasing from 0.45 to 0.75 and the validation accuracy increasing from 0.53 to 0.65 before leveling off

The two plots show how the model changed during training:

  • Loss (left): how wrong the model is — lower is better
  • Accuracy (right): how often the model is correct — higher is better

Each plot shows:

  • the training data (solid line)
  • the validation data (dashed line)

We expect:

  • loss to decrease over time
  • accuracy to increase over time
Challenge

Inspect the training curves

Look at the plots and answer:

  1. What happens to the loss during training?
  2. What happens to the accuracy?
  3. Do the training and validation lines behave similarly?
  4. Based on this, do you think the model will perform well on new data?
  • Loss decreases over time, which shows the model is improving
  • Accuracy increases over time
  • The validation lines improve at first, but then level off
  • This suggests the model is starting to overfit and may not perform as well on new data

What is overfitting?

In the plots, we can see that:

  • training performance keeps improving
  • validation performance stops improving

This is called overfitting. Overfitting happens when the model learns the training data too well, including details that don’t generalize to new data. As a result, the model performs well on the training data but less well on new images. Signs of overfitting include:

  • training loss keeps decreasing
  • validation loss stops improving or increases
  • training accuracy is much higher than validation accuracy

How can we address overfitting?

There are several ways to reduce overfitting. Common approaches include:

  • collecting more training data
  • simplifying the model (fewer layers or parameters)
  • adding techniques that help the model generalise better

These approaches aim to help the model focus on general patterns rather than memorising the training data. In a later episode, we’ll look at one of these techniques: dropout.

What did we do?


In this episode, we took our CNN and made it learn from data.

We:

  • compiled the model by choosing an optimizer, loss function, and metric
  • trained the model using Model.fit()
  • monitored how its performance changed during training

By plotting the loss and accuracy, we could see how well the model was learning and identify when it started to overfit.

We now have a trained model, and understand how to check whether it is learning effectively. In the next part of the workflow, we’ll use this model to make predictions and evaluate how well it performs on new data.

Key Points
  • Use Model.compile() to set how a model will learn.
  • The optimizer controls how the model updates its weights.
  • The loss function measures how wrong the model’s predictions are.
  • Metrics such as accuracy tell us how well the model is performing.
  • Use Model.fit() to train the model on data.
  • Training and validation loss and accuracy help us monitor learning.
  • Overfitting occurs when a model performs well on training data but less well on new data.

Content from Evaluate a Convolutional Neural Network and Make Predictions (Classifications)


Last updated on 2026-06-14 | Edit this page

Estimated time: 31 minutes

Overview

Questions

  • How do you use a trained CNN to make predictions?
  • How do you convert model predictions into class labels?
  • How do you evaluate model performance on a test dataset?

Objectives

  • Use a trained CNN to make predictions on new data.
  • Convert model outputs into predicted class labels.
  • Evaluate model performance using accuracy.
  • Interpret model performance using a confusion matrix.

Step 7. Perform a Prediction

Now that we have a trained model, we can use it to make predictions on new data.

We’ll use our test dataset, which contains images the model has not seen before.

Callout

Don’t have your model?

If you don’t have the trained model from the previous episode, you can load it instead:

PYTHON

# load pre-trained model
model_intro = tf.keras.models.load_model('../models/cifar_model_intro.keras')

Make predictions

We can use the Model.predict() function to generate predictions for the test dataset:

PYTHON

# make predictions
predictions = model_intro.predict(x = test_ds)

The model does not return a single answer.

Instead, it returns a list of values for each image — one for each class.

These values represent how confident the model is for each class.

Convert predictions to labels

To get a single predicted class, we select the highest value for each prediction:

PYTHON

# convert predictions to class labels
predicted_labels = tf.argmax(predictions, axis=1)

Step 8. Measuring performance

Now that we have predictions, we can measure how well our model performs on the test dataset.

Accuracy

A simple way to evaluate our model is to calculate its accuracy — how often the predictions match the true labels. To do this, we also need the true labels from our test dataset.

We can do this using:

PYTHON

from sklearn.metrics import accuracy_score

test_labels = np.concatenate([y for x, y in test_ds], axis=0)
test_acc = accuracy_score(y_true=test_labels, y_pred=predicted_labels)
print('Accuracy:', round(test_acc,2))

OUTPUT

Accuracy: 0.67

Accuracy is the proportion of correct predictions.

For example, an accuracy of 0.67 means the model correctly classified 67% of the test images.

Confusion matrix

Accuracy gives us an overall score, but it doesn’t tell us which classes the model is getting right or wrong.

A confusion matrix helps us see this.

PYTHON

from sklearn.metrics import confusion_matrix

conf_matrix = confusion_matrix(y_true=test_labels, y_pred=predicted_labels)
print(conf_matrix)

OUTPUT

[[ 9 15 17  5  4]
 [ 3 17 16  8  6]
 [12 10 13  5 10]
 [ 7 13 10  5 15]
 [10 12 12  4 12]]

In a confusion matrix:

  • rows represent the true labels
  • columns represent the predicted labels

Values along the diagonal are correct predictions.

We can make this easier to read using a heatmap:

PYTHON

sns.heatmap(data=conf_matrix, annot=True, fmt='3g')

Darker values along the diagonal indicate correct predictions.

Off-diagonal values show where the model is confusing one class with another.

Confusion matrix of model predictions where the colour scale goes from black to light to represent values from 0 to the total number of test observations in our test set of 1000. The diagonal has much lighter colours, indicating our model is predicting well, but a few non-diagonal cells also have a lighter colour to indicate where the model is making a large number of prediction errors.
Challenge

Inspect model performance

Look at the confusion matrix and answer:

  1. Does the model perform equally well across all classes?
  2. Which classes seem to be confused with each other?
  3. How does this compare to the accuracy value?
  • The model performs better on some classes than others
  • Some classes are frequently confused with similar ones
  • Accuracy gives an overall score, but the confusion matrix shows where errors occur

What did we do?


In this episode, we used our trained CNN to make predictions on new data and evaluate how well it performs.

We:

  • used the model to generate predictions
  • converted those predictions into class labels
  • measured performance using accuracy
  • used a confusion matrix to better understand the results

We can now see not just how often the model is correct, but also where it is making mistakes.

In the next part of the workflow, we’ll look at ways to improve our model and make more accurate predictions.

Key Points
  • Use Model.predict() to make predictions with a trained model.
  • Model outputs represent confidence values for each class.
  • Use argmax to convert model outputs into predicted class labels.
  • Accuracy measures how often predictions are correct.
  • Confusion matrices help identify which classes are predicted correctly and where errors occur.

Content from Share a Convolutional Neural Network and Next Steps


Last updated on 2026-06-09 | Edit this page

Estimated time: 12 minutes

Overview

Questions

  • How do I share my convolutional neural network (CNN)?
  • Where can I find pre-trained models?
  • Is Keras the best library to use?
  • What is a GPU?
  • What else can I do with a CNN?

Objectives

  • Learn how to save and load models.
  • Know where to search for pretrained models.
  • know about other Deep Learning Libraries.
  • Understand what a GPU is and what it can do for you.
  • Explain when to use a CNN and when not to.

Step 10. Share model

We now have a trained network that performs at a level we are happy with and maintains high prediction accuracy on a test dataset. We should consider publishing a file with both the architecture of our network and the weights which it has learned (assuming we did not use a pre-trained network). This will allow others to use it as as pre-trained network for their own purposes and for them to (mostly) reproduce our result.

The Keras method to save is found in the [Model training APIs] ‘Saving & serialization’ section of the documentation and has the following definition:

Model.save(filepath, overwrite=True, **kwargs)
  • filepath tells Keras where to save the model

We can use this method to save out model.

PYTHON

# save best model
model_best.save('model_best.keras')

This saved model can be loaded again by using the load_model method:

PYTHON

# load a saved model
pretrained_model = keras.models.load_model('model_best.keras')

This loaded model can be used as before to predict.

PYTHON

# use the pretrained model to predict the class name of the first test image
result_pretrained = pretrained_model.predict(test_images[0].reshape(1,32,32,3))

print('The predicted probability of each class is: ', result_pretrained.round(4))
print('The class with the highest predicted probability is: ', class_names[result_pretrained.argmax()])

OUTPUT

cat

The saved .keras file contains:

  • The model’s configuration (architecture).
  • The model’s weights.
  • The model’s optimizer’s state (if any).

Note that saving the model does not save the training history (i.e. training and validation loss and accuracy). For that, you save the model history dataframe we used for plotting.

The Keras documentation for Saving and Serialization explains other ways to save your model.

To share your model with a wider audience it is recommended you create git repository, such as GitHub, and upload your code, images, and model outputs to the cloud. In some cases, you may be able to offer up your model to an online repository of pretrained models.

Choosing a pretrained model

If your data and problem is very similar to what others have done, a pre-trained network might be preferable. Even if your problem is different, if the data type is common (for example images), you can use a pre-trained network and fine-tune it for your problem. A large number of openly available pre-trained networks can be found online, including: - Model Zoo - pytorch hub - tensorflow hub - GitHub

What else do I need to know?

How to choose a Deep Learning Library

In this lesson we chose to use Keras because it was designed to be easy to use and usually requires fewer lines of code than other libraries. Keras can actually work on top of TensorFlow (and several other libraries), hiding away the complexities of TensorFlow while still allowing you to make use of their features.

The performance of Keras is sometimes not as good as other libraries and if you are going to move on to create very large networks using very large datasets then you might want to consider one of the other libraries. But for many applications the performance difference will not be enough to worry about and the time you will save with simpler code will exceed what you will save by having the code run a little faster.

Keras also benefits from a very good set of online documentation and a large user community. You will find most of the concepts from Keras translate very well across to the other libraries if you wish to learn them at a later date.

A couple of those libraries include:

  • TensorFlow was developed by Google and is one of the older Deep Learning libraries, ported across many languages since it was first released to the public in 2015. It is very versatile and capable of much more than Deep Learning but as a result it often takes a lot more lines of code to write Deep Learning operations in TensorFlow than in other libraries. It offers (almost) seamless integration with GPU accelerators and Google’s own TPU (Tensor Processing Unit) chips specially built for machine learning.

  • PyTorch was developed by Facebook in 2016 and is a popular choice for Deep Learning applications. It was developed for Python from the start and feels a lot more “pythonic” than TensorFlow. Like TensorFlow it was designed to do more than just Deep Learning and offers some very low level interfaces. PyTorch Lightning offers a higher level interface to PyTorch to set up experiments. Like TensorFlow it is also very easy to integrate PyTorch with a GPU. In many benchmarks it outperforms the other libraries.

  • NEW Keras Core In Fall 2023, this library will become Keras 3.0. Keras Core is a full rewrite of the Keras codebase that rebases it on top of a modular backend architecture. It makes it possible to run Keras workflows on top of arbitrary frameworks — starting with TensorFlow, JAX, and PyTorch.

What is a GPU and do I need one?

A GPU, or Graphics Processing Unit, is a specialized electronic circuit designed to accelerate graphics rendering and image processing in a computer. In the context of deep learning and machine learning, GPUs have become essential due to their ability to perform parallel computations at a much faster rate compared to traditional central processing units (CPUs). This makes them well-suited for the intensive matrix and vector operations that are common in deep learning algorithms.

As you have experienced in this lesson, training CNN models can take a long time. If you follow the steps presented here you will find you are training multiple models to find the one best suited to your needs, particularly when fine tuning hyperparameters. However you have also seen that running on CPU only machines can be done! So while a GPU is not an absolute requirement for deep learning, it can significantly accelerate your deep learning work and make it more efficient, especially for larger and more complex tasks.

If you don’t have access to a powerful GPU locally, there are cloud services that provide GPU instances for deep learning. This may be the most cost-effective option for many users.

It this the best/only way to code up CNNs for image classification?

Absolutely not! The code we used in today’s workshop might be considered old fashioned. A lot of the data preprocessing we did by hand can now be done by adding different layer types to your model. The preprocessing layers section fo the Keras documentation provides several examples.

The point is that this technology, both hardware and software, is dynamic and changing at exponentially increasing rates. It is essential to stay curious and open to learning and follow up with continuous education and practice. Other strategies to stay informed include:

  • Online communications and forums, such as the Reddit’s r/MachineLearning and Data Science Stack Exchange
    • watch out for outdated threads!
  • Academic journals and conferences
    • Unlike other sciences, computer science digital libraries like arXiv enable researchers to publish their preprints in advance and disseminates recent advances more quickly than traditional methods of publishing
  • GitHub repositories
  • Practice
    • like any other language, you must use it or lose it!

What other uses are there for neural networks?

In addition to image classification, Episode 01 Introduction to Deep Learning introduced other computer vision tasks, including object detection and instance and semantic segmentation. These can all be done with CNNs and are readily transferable to videos. Also included in these tasks is medical imaging for diagnoses of disease and, of course, facial recognition.

However, there are many other tasks which CNNs are not well suited for:

  • Data where input size varies
    • Natural Language Processing (NLP) for text classification (sentiment analysis, spam detection, topic classification)
    • Speech Recognition for speech to text conversion
  • Sequential data and Time-series analysis
    • sensor readings, financial data, health monitoring
    • Use Recurrent Nueral Networks (RNNs) or Long Short-Term Memory networks (LTSMs)
  • Applications where interpretability and explainability is crucial
    • Use simpler models, e.g., decision trees
  • Situations where you lack sufficient training data
Key Points
  • To use Deep Learning effectively, go through a workflow of: defining the problem, identifying inputs and outputs, preparing data, choosing the type of network, choosing a loss function, training the model, tuning hyperparameters, and measuring performance.
  • Use Model.save() and share your model with others.
  • Keras is a Deep Learning library that is easier to use than many of the alternatives such as TensorFlow and PyTorch.
  • Graphical Processing Units are useful, though not essential, for deep learning tasks.
  • CNNs work well for a variety of tasks, especially those involving grid-like data with spatial relationships, but not for time series or variable sized input data.