This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

Introduction to Tree Models in Python

Decision trees are a family of algorithms that are based around a tree-like structure of decision rules. These algorithms often perform well in tasks such as prediction and classification. This lesson explores the properties of tree models in the context of mortality prediction.

The dataset that we will be using for this project is a subset of the eICU Collaborative Research Database that has been created for demonstration purposes.

Prerequisites

You need to understand the basics of Python before tackling this lesson. The lesson sometimes references Jupyter Notebook although you can use any Python interpreter mentioned in the Setup.

Getting Started

To get started, follow the directions on the “Setup” page to download data and install a Python interpreter.

Schedule

Setup Download files required for the lesson
00:00 1. Introduction What steps are needed to prepare data for analysis?
How do I create training and test sets?
00:30 2. Decision trees What is a decision tree?
Can decision trees be used for classification and regression?
What is gini impurity and how is it used?
01:00 3. Variance Why are decision trees ‘high variance’?
What is overfitting?
Why might you choose to prune a tree?
What is the benefit is combining trees?
01:30 4. Boosting What is meant by a “weak learner”?
How can “boosting” improve performance?
02:00 5. Bagging “Bagging is the shortened name for what?”
How can bagging improve model performance?
02:30 6. Random forest How can subselection of variables improve performance?
03:00 7. Gradient boosting What is the state of the art in tree models?
03:30 8. Performance How well do our predictive models perform?
04:00 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.