Introduction
- Machine learning is a set of tools and techniques that use data to make predictions.
- Artificial intelligence is a broader term that refers to making computers show human-like intelligence.
- Deep learning is a subset of machine learning.
- All machine learning systems have limitations to be aware of.
Supervised methods - Regression
- Scikit-Learn is a Python library with lots of useful machine learning functions.
- Scikit-Learn includes a linear regression function.
- Scikit-Learn can perform polynomial regressions to model non-linear data.
Supervised methods - Classification
- Classification requires labelled data (is supervised)
Ensemble methods
- Ensemble methods can be used to reduce under/over fitting training data.
Unsupervised methods - Clustering
- Clustering is a form of unsupervised learning.
- Unsupervised learning algorithms don’t need training.
- Kmeans is a popular clustering algorithm.
- Kmeans is less useful when one cluster exists within another, such as concentric circles.
- Spectral clustering can overcome some of the limitations of Kmeans.
- Spectral clustering is much slower than Kmeans.
- Scikit-Learn has functions to create example data.
Unsupervised methods - Dimensionality reduction
- PCA is a linear dimensionality reduction technique for tabular data
- t-SNE is another dimensionality reduction technique for tabular data that is more general than PCA
Neural Networks
- Perceptrons are artificial neurons which build neural networks.
- A perceptron takes multiple inputs, multiplies each by a weight value and sums the weighted inputs. It then applies an activation function to the sum.
- A single perceptron can solve simple functions which are linearly separable.
- Multiple perceptrons can be combined to form a neural network which can solve functions that aren’t linearly separable.
- We can train a whole neural network with the back propagation algorithm. Scikit-learn includes an implementation of this algorithm.
- Training a neural network requires some training data to show the network examples of what to learn.
- To validate our training we split the training data into a training set and a test set.
- To ensure the whole dataset can be used in training and testing we can train multiple times with different subsets of the data acting as training/testing data. This is called cross validation.
- Deep learning neural networks are a very powerful modern machine learning technique. Scikit-Learn does not support these but other libraries like Tensorflow do.
- Several companies now offer cloud APIs where we can train neural networks on powerful computers.
Ethics and the Implications of Machine Learning
- The results of machine learning reflect biases in the training and input data.
- Many machine learning algorithms can’t explain how they arrived at a decision.
- Machine learning can be used for unethical purposes.
- Consider the implications of false positives and false negatives.
Find out more
- This course has only touched on a few areas of machine learning and is designed to teach you just enough to do something useful.
- Machine learning is a rapidly evolving field and new tools and techniques are constantly appearing.