Introduction


  • Machine learning is a set of tools and techniques that use data to make predictions.
  • Artificial intelligence is a broader term that refers to making computers show human-like intelligence.
  • Deep learning is a subset of machine learning.
  • All machine learning systems have limitations to be aware of.

Supervised methods - Regression


  • Scikit-Learn is a Python library with lots of useful machine learning functions.
  • Scikit-Learn includes a linear regression function.
  • Scikit-Learn can perform polynomial regressions to model non-linear data.

Supervised methods - Classification


  • Classification requires labelled data (is supervised)

Ensemble methods


  • Ensemble methods can be used to reduce under/over fitting training data.

Unsupervised methods - Clustering


  • Clustering is a form of unsupervised learning.
  • Unsupervised learning algorithms don’t need training.
  • Kmeans is a popular clustering algorithm.
  • Kmeans is less useful when one cluster exists within another, such as concentric circles.
  • Spectral clustering can overcome some of the limitations of Kmeans.
  • Spectral clustering is much slower than Kmeans.
  • Scikit-Learn has functions to create example data.

Unsupervised methods - Dimensionality reduction


  • PCA is a linear dimensionality reduction technique for tabular data
  • t-SNE is another dimensionality reduction technique for tabular data that is more general than PCA

Neural Networks


  • Perceptrons are artificial neurons which build neural networks.
  • A perceptron takes multiple inputs, multiplies each by a weight value and sums the weighted inputs. It then applies an activation function to the sum.
  • A single perceptron can solve simple functions which are linearly separable.
  • Multiple perceptrons can be combined to form a neural network which can solve functions that aren’t linearly separable.
  • We can train a whole neural network with the back propagation algorithm. Scikit-learn includes an implementation of this algorithm.
  • Training a neural network requires some training data to show the network examples of what to learn.
  • To validate our training we split the training data into a training set and a test set.
  • To ensure the whole dataset can be used in training and testing we can train multiple times with different subsets of the data acting as training/testing data. This is called cross validation.
  • Deep learning neural networks are a very powerful modern machine learning technique. Scikit-Learn does not support these but other libraries like Tensorflow do.
  • Several companies now offer cloud APIs where we can train neural networks on powerful computers.

Ethics and the Implications of Machine Learning


  • The results of machine learning reflect biases in the training and input data.
  • Many machine learning algorithms can’t explain how they arrived at a decision.
  • Machine learning can be used for unethical purposes.
  • Consider the implications of false positives and false negatives.

Find out more


  • This course has only touched on a few areas of machine learning and is designed to teach you just enough to do something useful.
  • Machine learning is a rapidly evolving field and new tools and techniques are constantly appearing.