This lesson is being piloted (Beta version)

Machine Learning for Tabular Data in R: Glossary

Key Points

A Brief Introduction to Machine Learning
  • There are many types of machine learning.

  • We will focus on some methods that work well with tabular data.

Linear and Logistic Regression
  • Classical linear and logistic regression models can be thought of as examples of regression and classification models in machine learning.

  • Testing sets can be used to measure the performance of a model.

Decision Trees
  • Training data can give us a decision tree model.

  • Decision trees can be used for supervised learning, but they are not very robust.

Random Forests
  • Random forests can make predictions of a categorical or quantitative variable.

  • Random forests, with their default settings, work reasonably well.

Gradient Boosted Trees
  • Gradient boosted trees can be used for the same types of problems that random forests can solve.

  • The learning rate can affect the performance of a machine learning algorithm.

Cross Validation and Tuning
  • Parameter tuning can improve the fit of an XGBoost model.

  • Cross validation allows us to tune parameters using the training set only, saving the testing set for final model evaluation.

Glossary

FIXME