These profiles describe the potential learners that we anticipate as learners for this lesson. These can be used if you are deciding if this material is right for you or your students. If you plan to contribute material to this lesson, these will help you understand the target audience so that we can have a collaboratively developed, but cohesive lesson.
Tyra is an environmental biologist that uses DNA signatures obtained from soils to study species diversity in the environment. She needs to compare DNA sequences to large databases. So far, she has been able to use web-based tools for her limited datasets.
Recently, Tyra has started working with much larger datasets, and discovered that the online tool she uses has a limit of 50 entries on the online server. She has heard it should be possible to run the same tool through the command line, and managed to install it on her local Laptop. Now, however, it takes several days before each of the analyses are finished.
The workshop will teach Tyra to move her data to and from the university’s computer cluster, and submit jobs using pre-installed software on the cluster. Afterwards, Tyra will be able to analyze her own data and pre-installed command-line based versions of the tool to spread the analysis over several dozen cores so it finishes in a few hours.
As a new PhD student, Maria is given a task to select parameters for their simulation. They need to run a set of calculations on several thousand combinations of parameters. One calculation takes several minutes. They set up the problem on their laptop but quickly realise that it would take more than a month to complete the task. They are told to use local HPC but they are not sure how this would help them.
Dana wants to cross-validate a model for a statistics class project. This involves running the model 1000 times — but each run takes an hour. Running the model on a laptop will take over a month!
Rina, a genomics researcher, has been using small datasets of sequence data, but soon will be receiving a new type of sequencing data that is 10 times as large. It’s already challenging to open the datasets on a computer — analyzing these larger datasets will probably crash it.
Lucy is using a fluid dynamics package that has an option to run in parallel. So far, this option was not utilized on a desktop. In going from 2D to 3D simulations, the simulation time has more than tripled. It might be useful to take advantage of that parallel feature to speed things up.