Summary and Setup
This lesson equips participants with trustworthy AI/ML practices, emphasizing fairness, explainability, reproducibility, accountability, and safety across three general data/model modalities: structured data (tabular), natural language processing (NLP), and computer vision. Participants will learn to evaluate and enhance the trustworthiness and reliability of models in each modality. Additionally, they will explore how to integrate these principles into future models, bridging ethical practices with practical applications in their research.
Prerequisite
- Participants should have experience using Python.
- Participants should have a basic understanding of machine learning (e.g., familiar with the concepts like train/test split and cross-validation) and should have trained at least one model in the past.
- Participants should have some preliminary experience (or at least exposure) to neural networks.
- Participants should care about the interpretability, reproducibility, and/or fairness of the models they build.
- Participants should have domain knowledge of the field they work in and want to build models for.
Setup
The full workshop setup includes (1) software installation, (2) downloading the data, and (3) setting up a HuggingFace account & access token. If you have any trouble with the steps outlined below, please contact the workshop organizers ASAP to make sure you have everything completed before the workshop starts.
Software setup
You will need a terminal (or Git Bash recommended for Windows), Python 3.11.9, and the ability to create Python virtual environments. You will also need to install a variety of packages within your virtual environment.
1) Installing Git Bash (Windows only)
We will be launching Jupyter Lab (IDE) from a terminal (Mac/Linux) or Git Bash (Windows) during this workshop. If you will be using a Windows machine for this workshop, please install Git Bash (“Git for Windows”).
How to open Git Bash (Windows)
- After installation, search for “Git Bash” in the Start Menu.
- Click on the “Git Bash” application to open it.
- A terminal window will appear where you can type commands.
2) Installing Python 3.11.9
- Download Python 3.11.9 using one of the OS-specifc download links
below (retrieved from Python.org)
If prompted, make sure to check the box for “Add Python to
PATH” during the setup process.
- Mac: macOS 64-bit universal2 installer
- Windows: Windows installer (64-bit)
- Open Terminal (Mac/Linux) or Git Bash (Windows).
- Mac/Linux: Open the “Terminal” application, which can usually be found using Spotlight (Cmd + Space) or under Applications > Utilities.
- Windows: Open Git Bash as described above.
- Type one of the following commands to check your Python version:
Python 3.11.9
3) Create a workshop folder on your Desktop called “trustworthy_ML”
We’ll use this folder to store code throughput the workshop. We’ll also add our virtual environment to this folder.
In terminal (Mac/Linux) or Git Bash, create folder using.
4) Creating a new virtual environment
We’ll install the prerequisite libraries in a virtual environment, to prevent them from cluttering up your Python environment and causing conflicts.
To create a new virtual environment (“venv”) for the project, open the terminal (Mac/Linux), Git Bash (Windows), or Anaconda Prompt (Windows), and type one of the below OS-specific options below.
Make sure you are already CD’d into your workshop folder,
Desktop/trustworthy_ML
. The code below will create a new
virtual environment in a folder named `venv/`` in the current working
directory.
SH
cd Desktop/trustworthy_ML # if you're not already in this folder, CD to it (adjust path, if necesssary)
# Run one of the below options (OS-specific)
python3.11 -m venv venv # mac/linux
python -m venv venv # windows
If you run ls
(list files), you should see a new
`venv/`` folder in your trustworthy_ML folder.
If you’re on Linux and this doesn’t work, you may need to install venv first. Try running
sudo apt-get install python3-venv
first, thenpython3 -m venv venv
5) Activating the environment
To activate the environment, run the following OS-specific commands in Terminal (Mac/Linux) or Git Bash (Windows):
6) Installing your prerequisites
Once the virtual environment is activated, install the prerequisites by running the following commands:
First, make sure you have the latest version of pip by running:
SH
python -m pip install --upgrade pip # now that environment is activated, "python" (not "python3") should work for both mac and windows users
Then, install the required libraries. We’ve chosen a CPU-only (no GPUs enabled) setup for this lesson to make the environment simpler and more accessible for everyone. By avoiding GPU-specific dependencies like CUDA, we reduce the storage requirements by 3-4 GB and eliminate potential compatibility issues related to GPU hardware.
Note: If prompted to Proceed ([y]/n) during environment setup, press y. It may take around 10-20 minutes to complete the full environment setup. Please reach out to the workshop organizers sooner rather than later to fix setup issues prior to the workshop.
7) Adding your virtual environment to JupyterLab
We want Jupyter Lab to have access to the enviornment we just built. To use this virtual environment in JupyterLab, follow these steps:
- Install the
ipykernel
package:
- Add the virtual environment as a Jupyter kernel:
- When you launch JupyterLab, select the
trustworthy_ML
kernel to ensure your code runs in the correct environment.
8) Verify the setup
Change directory to your code folder before launching Jupyter. This will help us keep our code organized in one place.
To start jupyter lab, open a terminal (Mac/Linux) or Git Bash (Windows) and type the command:
After launching, start a new notebook using the
trustworthy_ML
kernel to ensure your code runs in the
correct environment. Then run the following lines of code:
PYTHON
import torch
import pandas as pd
import sklearn
import jupyter
import tensorflow as tf
import transformers
import pytorch_ood
import fairlearn
import umap
import sys
# Tested versions in this workshop:
print(f"Python version: {sys.version_info[0]}.{sys.version_info[1]}.{sys.version_info[2]}") # 3.11.9
print("Torch version:", torch.__version__) # >= 2.2
print("Pandas version:", pd.__version__) # >= 2.2.3
print("Scikit-learn version:", sklearn.__version__) # >= 1.5.2
print("TensorFlow version:", tf.__version__) # >= 2.16
print("Transformers version:", transformers.__version__) # >= 4.46.3
print("PyTorch-OOD version:", pytorch_ood.__version__) # >= 0.2.0
print("Fairlearn version:", fairlearn.__version__) # >= 0.11.0
print("UMAP version:", umap.__version__) # >= 0.5.7
This should output the versions of all required packages without giving errors. There might be an informational warning (tensorflow cpu_feature_guard) about CPU optimized version of tensorflow, which won’t have any impact. Most versions should work fine with this lesson, but we’ve only tested thoroughly with the versions commented above.
Fallback option: cloud environment
If a local installation does not work for you, it is also possible to run (most of) this lesson in Google Colab. Some packages may need to be installed on the fly within the notebook (TBD).
Deactivating/activating environment
To deactivate your virtual environment, simply run
deactivate
in your terminal. If you close the terminal or
Git Bash without deactivating, the environment will automatically close
as the session ends. Later, you can reactivate the environment using the
“Activate environment” instructions above to continue working. If you
want to keep coding in the same terminal but no longer need this
environment, it’s best to explicitly deactivate it. This ensures that
the software installed for this workshop doesn’t interfere with your
default Python setup or other projects.
Download and move the data needed
For the fairness evaluation episode, you will need access to the Medical Expenditure Panel Survey Dataset. Please complete these steps to ensure you have access:
Download AI 360 Fairness example data: Medical Expenditure Panel Survey data (zip file)
Unzip h181.zip (right-click and extract all on Windows; double-click zip file on Mac)
-
In the unzipped folder, find the h181.csv file. Place the
h181.csv
file in theaif360
package’s data directory within your virtual environment folder:-
Windows:
Desktop/trustworthy_ML/venv/Lib/site-packages/aif360/data/raw/meps/h181.csv
-
Mac/Linux:
Desktop/trustworthy_ML/venv/lib/python3.x/site-packages/aif360/data/raw/meps/h181.csv
-
Windows:
Create a Hugging Face account and access Token
You will need a Hugging Face account for the workshop episode on model sharing. Hugging Face is a very popular machine learning (ML) platform and community that helps users build, deploy, share, and train machine learning models.
Create account: To create an account on Hugging Face, visit: huggingface.co/join. Enter an email address and password, and follow the instructions provided via Hugging Face (you may need to verify your email address) to complete the process.
Setup access token: Once you have your account created, you’ll need to generate an access token so that you can upload/share models to your Hugging Face account during the workshop. To generate a token, visit the Access Tokens setting page after logging in. Once there, click “New token” to generate an access token. Select the “Write” access token type, and then provide a short name (e.g., trustworthyAI). We’ll use this token later to log in to Hugging Face via Python.