This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

Intro to AI for GLAM: Glossary

Key Points

Welcome
  • Intro to AI for GLAM is for staff working in the GLAM (Galleries, Libraries, Archives, and Museums) sector.

  • The lesson is a high-level conceptual introduction to AI and machine learning that will empower GLAM staff to apply those technologies within their own institutions and collections.

  • This lesson will not cover coding, statistics or maths.

Artificial Intelligence (AI) and Machine Learning (ML) in a nutshell
  • Machine Learning is a subfield of AI which identifies patterns in data

  • Supervised learning algorithms learn by example

  • Unsupervised learning algorithms put data into groups of similar objects or records

Artificial Intelligence (AI) and Machine Learning (ML) in a nutshell
  • Machine Learning is a subfield of AI which identifies patterns in data

  • Supervised learning algorithms learn by example

  • Unsupervised learning algorithms put data into groups of similar objects or records

What is Machine Learning good at?
  • First key point. Brief Answer to questions. (FIXME)

Understanding and managing bias
  • Bias occurs when a dataset is not representative of the population, it is incomplete or skewed.

  • The presence of bias in the classifications and predictions of machine learning may have far reaching consequences for society, amplifying inequality and unfairness.

  • There are abundant opportunities for bias to enter ML systems at all stages of the pipeline including when datasets are constructed, when a models learning is refined and reinforced, and when predictions made by a model are interpreted by humans and applied to real world scenarios

  • There are a range of strategies available today to help mitigate bias.

Applying Machine Learning
  • Machine learning projects involve many considerations beyond training a model.

  • The predictions made by the same machine learning model can be ‘translated’ into actions in different ways. The extent to which you ‘automate’ decisions versus keeping a ‘human-in-the-loop’ will depend on the problem you are tackling, your organization and your model’s performance.

  • The use of Machine learning by GLAMs is relatively new. Sharing results and lessons learned will likely help GLAMS realize the potential benefits of machine learning.

The Machine Learning ecosystem
  • FIXME

Glossary

artificial intelligence :

bias :

machine learning
The study or use of algorithms whose performance improves as they are given more data. Machine learning algorithms often use training data to build a model. Their performance is then measured by how well they predict the properties of test data. It is a set of technologies and methods for finding rules when they are too complex to define. They are systems which find rules, learn, and make predictions from data without being explicitely programmed to do so. https://glosario.carpentries.org/en/#machine_learning
model
A specification of the mathematical relationship between different variables.https://glosario.carpentries.org/en/#model

regression analysis :

reinforcement learning
Any machine learning algorithm which is not given specific goals to meet, but instead is given feedback on whether or not it is making progress. https://glosario.carpentries.org/en/#reinforcement_learning

semi-supervised learning :

supervised learning
A machine learning algorithm in which a system is taught to classify values given training data containing previously-classified values. https://glosario.carpentries.org/en/#supervised_learning
test data
Test data is a portion of a dataset used to evaluate the correctness of a machine learning algorithm after it has been trained. It should always be separated from the training data to ensure that the model is properly tested with unseen data. https://glosario.carpentries.org/en/#test_data
training data
Training data is a portion of a dataset used to train machine learning algorithm to recognise similar data. It should always be separated from the test data to ensure that the model is properly tested with data it has never seen before. https://glosario.carpentries.org/en/#training_data
unsupervised learning
Algorithms that cluster data without knowing in advance what the groups will be. https://glosario.carpentries.org/en/#unsupervised_learning

Ameisen, Emmanuel. Building Machine Learning Powered Applications: Going from Idea to Product, 2020.

Barbosa, N., & Chen, M. (2021). Rehumanized Crowdsourcing. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. Dl.acm.org. Retrieved 29 March 2021, from https://dl.acm.org/doi/10.1145/3290605.3300773.

Barlow, R. (2014). BU Research: A Riddle Reveals Depth of Gender Bias. BU Today. Boston University. Retrieved 29 March 2021, from https://www.bu.edu/articles/2014/bu-research-riddle-reveals-the-depth-of-gender-bias.

Catanzaro, B. (2019, December 4). “Datasets make algorithms: how creating, curating, and distributing data creates modern AI.” [Video file]. Retrieved from https://library.stanford.edu/projects/fantastic-futures.

Coleman, C. (2020). Managing Bias When Library Collections Become Data. International Journal Of Librarianship, 5(1), 8-19. https://doi.org/10.23974/ijol.2020.vol5.1.162.

Cordell, Ryan. ‘Machine Learning + Libraries’. LC Labs. Accessed 28 March 2021. https://labs.loc.gov/static/labs/work/reports/Cordell-LOC-ML-report.pdf.

Ekowo, M. (2016). Why Numbers can be Neutral but Data Can’t. New America. Retrieved 29 March 2021, from https://www.newamerica.org/education-policy/edcentral/numbers-can-neutral-data-cant/.

Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J., Wallach, H., Daumeé III, H., & Crawford, K. (2020). Datasheets for Datasets. arXiv.org. Retrieved 29 March 2021, from https://arxiv.org/abs/1803.09010v3.

Hellström, T., Dignum, V., & Bensch, S. (2020). Bias in Machine Learning What is it Good (and Bad) for?. arXiv preprint. Retrieved 20 April 2021, from https://arxiv.org/abs/2004.00686v2.

Howard, Jeremy, Sylvain Gugger, and an O’Reilly Media Company Safari. Deep Learning for Coders with Fastai and PyTorch, 2020.

Jo, E., & Gebru, T. (2020). Lessons from archives. Proceedings Of The 2020 Conference On Fairness, Accountability, And Transparency. https://doi.org/10.1145/3351095.3372829.

Lakshmanan, Valliappa, Sara Robinson, Michael Munn, and an O’Reilly Media Company Safari. Machine Learning Design Patterns, 2021.

Mayson, Sandra Gabriel, Bias In, Bias Out (2019). 128 Yale Law Journal 2218, University of Georgia School of Law Legal Studies Research Paper No. 2018-35, Available at SSRN: https://ssrn.com/abstract=3257004.

Mitchell, Margaret, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. ‘Model Cards for Model Reporting’. Proceedings of the Conference on Fairness, Accountability, and Transparency, 29 January 2019, 220–29. https://doi.org/10.1145/3287560.3287596.

Omoju Miller. ‘The Myth of Innate Ability in Tech’. Accessed 20 March 2021. http://omojumiller.com/articles/The-Myth-Of-Innate-Ability-In-Tech.

Padilla, T. (2019). Responsible Operations: Data Science, Machine Learning, and AI in Libraries. OCLC Research Position Paper. https://doi.org/10.25333/xk7z-9g97.

Slee, Tom. ‘The Incompatible Incentives of Private Sector AI’. Tom Slee, 31 March 2019. https://tomslee.github.io/publication/oup_private_sector_ai/.

Suresh, Harini, and John V. Guttag. ‘A Framework for Understanding Unintended Consequences of Machine Learning’. ArXiv:1901.10002 [Cs, Stat], 17 February 2020. http://arxiv.org/abs/1901.10002.

Thomas, Rachel. ‘The Problem with Metrics Is a Big Problem for AI · Fast.Ai’. fast.ai blog. Accessed 18 March 2021. https://www.fast.ai/2019/09/24/metrics/.