Introduction to Tree Models in Python

Decision trees are a family of algorithms that are based around a tree-like structure of decision rules. These algorithms often perform well in tasks such as prediction and classification. This lesson explores the properties of tree models in the context of mortality prediction.

The dataset that we will be using for this project is a subset of the eICU Collaborative Research Database that has been created for demonstration purposes.

Prerequisites

You need to understand the basics of Python before tackling this lesson. The lesson sometimes references Jupyter Notebook although you can use any Python interpreter mentioned in the Setup.

Getting Started

To get started, follow the directions on the “Setup” page to download data and install a Python interpreter.

Schedule

	Setup	Download files required for the lesson
00:00	1. Introduction	What steps are needed to prepare data for analysis? How do I create training and test sets?
00:30	2. Decision trees	What is a decision tree? Can decision trees be used for classification and regression? What is gini impurity and how is it used?
01:00	3. Variance	Why are decision trees ‘high variance’? What is overfitting? Why might you choose to prune a tree? What is the benefit is combining trees?
01:30	4. Boosting	What is meant by a “weak learner”? How can “boosting” improve performance?
02:00	5. Bagging	“Bagging is the shortened name for what?” How can bagging improve model performance?
02:30	6. Random forest	How can subselection of variables improve performance?
03:00	7. Gradient boosting	What is the state of the art in tree models?
03:30	8. Performance	How well do our predictive models perform?
04:00	Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.