Instructor Notes

Install the required workshop packages


Please use the instructions in the Setup document to perform installs. If you encounter setup issues, please file an issue with the tags ‘High-priority’.

Checking installations.


In the episodes/files/scripts/check_env.py directory, you will find a script called check_env.py This checks the functionality of the Anaconda install.

By default, Data Carpentry does not have people pull the whole repository with all the scripts and addenda. Therefore, you, as the instructor, get to decide how you’d like to provide this script to learners, if at all. To use this, students can navigate into _includes/scripts in the terminal, and execute the following:

BASH

python check_env.py

If learners receive an AssertionError, it will inform you how to help them correct this installation. Otherwise, it will tell you that the system is good to go and ready for Data Carpentry!

07-visualization-ggplot-python


iPython notebooks for plotting can be viewed in the learners folder.

08-putting-it-all-together


Answers are embedded with challenges in this lesson, other than random distribtuion which is left to the learner to choose, and final plot, for which the learner should investigate the matplotlib gallery.

Scientists often operate on mathematical equations. Being able to use them in their graphics has a lot of added value Luckily, Matplotlib provides powerful tools for text control. One of them is the ability to use LaTeX mathematical notation, whenever text is used (you can learn more about LaTeX math notation here: https://en.wikibooks.org/wiki/LaTeX/Mathematics). To use mathematical notation, surround your text using the dollar sign (“$”).

LaTeX uses the backslash character (“\”) a lot. Since backslash has a special meaning in the Python strings, you should replace all the LaTeX-related backslashes with two backslashes.

PYTHON

plt.plot(t, t, 'r--', label='$y=x$')
plt.plot(t, t**2 , 'bs-', label='$y=x^2$')
plt.plot(t, (t - 5)**2 + 5 * t - 0.5, 'g^:', label='$y=(x - 5)^2 + 5  x - \\frac{1}{2}$') # note the double backslash

plt.legend(loc='upper left', shadow=True, fontsize='x-large')

# Note the double backslashes in the line below.
plt.xlabel('This is the x axis. It can also contain math such as $\\bar{x}=\\frac{\\sum_{i=1}^{n} {x}} {N}$')
plt.ylabel('This is the y axis')
plt.title('This is the figure title')

plt.show()

This page contains more information.

Data visualization with Pandas and Matplotlib


Instructor Note

Warning, the line of code pd.read_csv("../data/raw/surveys_complete_77_89.csv") is prone to raise relative path errors for learners, for two main reasons:

  • They may have not set the directory structure as described in the previous episode: For example, if they used capital letters, added whitespaces, or just typed it differently. This would change their relative path to the file.

  • They didn’t create the notebook inside their scripts directory: JupyterLab seems to set the working directory to the place where the notebook was created. If a learner created the notebook at the root of their project folder then the relative path would change.

Additional to errors they may get, you would also need to spend some time talking about relative paths and why we need to use '../'. Prepare to spend 10 to 15 minutes successfully reading the csv file with your group.



Exploring and understanding data


Choose how to teach this section

The section on generative AI is intended to be concise but Instructors may choose to devote more time to the topic in a workshop. Depending on your own level of experience and comfort with talking about and using these tools, you could choose to do any of the following:

  • Explain how large language models work and are trained, and/or the difference between generative AI, other forms of AI that currently exist, and the limits of what LLMs can do (e.g., they can’t “reason”).
  • Demonstrate how you recommend that learners use generative AI.
  • Discuss the ethical concerns listed below, as well as others that you are aware of, to help learners make an informed choice about whether or not to use generative AI tools.

This is a fast-moving technology. If you are preparing to teach this section and you feel it has become outdated, please open an issue on the lesson repository to let the Maintainers know and/or a pull request to suggest updates and improvements.