Summary and Setup

Workshop Overview

This workshop introduces you to foundational workflows in Amazon SageMaker, covering data setup, code repo setup, model training, and hyperparameter tuning within AWS’s managed environment. You’ll learn how to use SageMaker notebooks to control data pipelines, manage training and tuning jobs, and evaluate model performance effectively. We’ll also cover strategies to help you scale training and tuning efficiently, with guidance on choosing between CPUs and GPUs, as well as when to consider parallelized workflows (i.e., using multiple instances).

To keep costs manageable, this workshop provides tips for tracking and monitoring AWS expenses, so your experiments remain affordable. While AWS isn’t entirely free, it can be very cost-effective for common machine learning (ML) workflows in used in research. For example, training roughly 100 small to medium-sized models (e.g., logistic regression, random forests, or lightweight deep learning models with a few million parameters) on a small dataset (under 10GB) can cost under $20, making it accessible for many research projects.

What This Workshop Does Not Cover (Yet)

Currently, this workshop does not include:

AWS Lambda for serverless function deployment,
MLFlow or other MLOps tools for experiment tracking,
Additional AWS services beyond the core SageMaker ML workflows.

If there’s a specific ML or AI workflow or AWS service you’d like to see included in this curriculum, please let us know! We’re happy to develop additional content to meet the needs of researchers, students, and ML/AI practitioners. Please post an issue on the lesson GitHub or contact endemann@wisc.edu with suggestions or requests.

Setup (Complete Before The Workshop)

Before attending this workshop, you’ll need to complete a few setup steps to ensure you can follow along smoothly. The main requirements are:

GitHub Account – Create an account and be ready to fork a repository.
AWS Access – Use a shared AWS account (if attending the 2025 Research Bazaar) or sign up for an AWS Free Tier account.
Titanic Dataset – Download the required CSV files in advance.
Workshop Repository – Fork the provided GitHub repository for use in AWS.
Visit Glossary — Find and briefly review the workshop glossary

Details on each step are outlined below.

1. GitHub Account

You will need a GitHub account to access the code provided during this lesson. If you don’t already have a GitHub account, please sign up for GitHub to create a free account. Don’t worry if you’re a little rusty on using GitHub/git; we will only use a couple of git commands during the lesson, and the instructor will provide guidance on these steps.

2. AWS Account

There are two ways to get access to AWs for this lesson. Please wait for a pre-workshop email from the instructor to confirm which option to choose.

Option 1) Shared Account

If you are attending this lesson as part of the 2025 Research Bazaar, we will provide a shared AWS account for all attendees. You do not need to set up your own AWS account. What to expect:

Before the workshop, you will receive an email invitation from the instructor with access details for the shared AWS account.
During the lesson, you will log in using the credentials provided in the email.
This setup ensures that all participants have the same environment and eliminates concerns about unexpected costs.

Option 2) AWS Free Tier — Skip If Using Shared Account

If you are attending this lesson as part of the 2025 Research Bazaar, you can skip this step. We will provide all attendees with a shared account. Otherwise, please follow these steps:

Go to the AWS Free Tier page and click Create a Free Account.
Complete the sign-up process. AWS offers a free tier with limited monthly usage. Some services, including SageMaker, may incur charges beyond free-tier limits, so be mindful of usage during the workshop. If you follow along with the materials, you can expect to incur around $10 in compute fees (e.g., from training and tuning several different models with GPU enabled at times).

Once your AWS account is set up, log in to the AWS Management Console to get started with SageMaker.

3. Download the Data

For this workshop, you will need the Titanic dataset. Please download the following files by right clicking each and selecting Save as. Make sure to save them out as .csvs:

Save these files to a location where they can easily be accessed. In the first episode, you will create an S3 bucket and upload this data to use with SageMaker.

4. Get Access To Workshop Code (Fork GitHub Repo)

You will need a copy of our AWS_helpers repo on GitHub to explore how to manage your repo in AWS. This setup will allow you to follow along with the workshop and test out the Interacting with Repositories episode.

To do this:

Go to the AWS_helpers GitHub repository.
Click Fork (top right) to create your own copy of the repository under your GitHub account.
Once forked, you don’t need to do anything else. We’ll clone this fork once we start working in the AWS Jupyter environment using…

PYTHON

!git clone https://github.com/YOUR_GITHUB_USERNAME/AWS_helpers.git

5. Review the Workshop Glossary Page

When learning cloud tools for the first time, understanding new terminology is half the battle. We encourage learners to briefly review the Glossary page (also accessible from the top menu of each lesson page) before the workshop. You don’t need to memorize the terms—just a quick read-through will help familiarize you with key concepts. Once we start running our own AWS SageMaker experiments, these terms will start to make more sense in context. If you feel lost at any point during the workshop, please ask the instructor/helpers for assistance and/or refer back to the glossary.