Basic Machine Learning


  • Machine learning predicts outcomes from data.
  • As examples, machine learning can be used to discover new kinds of cancers or predict drug response in biomedical studies.

Clustering


  • We can use Euclidean distance to define the dissimilarity between samples.
  • We can also use other metrics according to the prior knowledge we have from our data.

Conditional Probabilities and Expectations


  • For categorical/discrete variables we have used strict conditions (i.e. X=x); however, conditioning can be applied to continuous variables by using ranges instead (e.g. X=x, X<=x, or a<X<b)

Smoothing


  • The smoothing methods work well when used inside the range of predictor values seen in the training set, however them are not suitable for extrapolation the prediction outside those ranges.

Class Prediction


  • Data quality matters. Garbage in, Garbage out!

Cross-validation


  • The mean validation error obtained from cross-validation is a better approximation of the test error (real world data) than the training error itself.