Trustworthy AI: Validity, Fairness, Explainability, and Uncertainty Assessments: Key Points

Overview

Some tasks are not appropriate for machine learning due to ethical concerns.
Machine learning tasks should have a valid prediction target that maps clearly to the real-world goal.
Training data can be biased due to societal inequities, errors in the data collection process, and lack of attention to careful sampling practices.
“Bias” also refers to statistical bias, and certain algorithms can be biased towards some solutions.

It’s important to consider many dimensions of model performance: a single accuracy score is not sufficient.
There is no single definition of “fair machine learning”: different notions of fairness are appropriate in different contexts.
Representational harms and stereotypes can be perpetuated by generative AI.
The fairness of a model can be improved by using techniques like data reweighting and model postprocessing.

Model Explainability vs. Model Interpretability:
- Interpretability: The degree to which a human can understand the cause of a decision made by a model, crucial for verifying correctness and ensuring compliance.
- Explainability: The extent to which the internal mechanics of a machine learning model can be articulated in human terms, important for transparency and building trust.
Choosing Between Explainable and Interpretable Models:
- When Transparency is Critical: Use interpretable models when understanding how decisions are made is essential.
- When Performance is a Priority: Use explainable models when accuracy is more important, leveraging techniques like LIME and SHAP to clarify complex models.
Accuracy vs. Complexity:
- The relationship between model complexity and accuracy is not always linear. Increasing complexity can improve accuracy up to a point but may lead to overfitting, highlighting the gray area in model selection. This is illustrated by the accuracy vs. complexity plot, which shows different models on these axes.

Out-of-distribution (OOD) data significantly differs from training data and can lead to unreliable model predictions.
Threshold-based methods use model outputs or distances in feature space to detect OOD instances by defining a score threshold.

Softmax-based OOD detection uses the model’s output probabilities to identify instances that do not belong to the training distribution.
Threshold selection is critical and involves trade-offs between retaining in-distribution data and detecting OOD samples.
Visualizations such as PCA and probability density plots help illustrate how OOD data overlaps with in-distribution data in feature space.
While simple and widely used, softmax-based methods have limitations, including sensitivity to threshold choices and reduced reliability in high-dimensional settings.
Understanding softmax-based OOD detection lays the groundwork for exploring more advanced techniques like energy-based detection.

Energy-based OOD detection is a modern and more robust alternative to softmax-based methods, leveraging energy scores to improve separability between in-distribution and OOD data.
By calculating an energy value for each input, these methods provide a more nuanced measure of compatibility between data and the model’s learned distribution.
Non-linear visualizations, like UMAP, offer better insights into how OOD and ID data are represented in high-dimensional feature spaces compared to linear methods like PCA.
PyTorch-OOD simplifies the implementation of energy-based and other OOD detection methods, making it accessible for real-world applications.
While energy-based methods excel in many scenarios, challenges include tuning thresholds across diverse OOD classes and ensuring generalizability to unseen distributions.
Transitioning to energy-based detection lays the groundwork for exploring training-time regularization and hybrid approaches.

Model cards are the standard technique for communicating information about how machine learning systems were trained and how they should and should not be used.
Models can be shared and reused via the Hugging Face platform.