Running commands with Maestro
Last updated on 2024-08-22 | Edit this page
Estimated time: 60 minutes
Overview
Questions
- “How do I run a simple command with Maestro?”
Objectives
- “Create a Maestro YAML file”
What is the workflow I’m interested in?
In this lesson we will make an experiment that takes an application which runs in parallel and investigate it’s scalability. To do that we will need to gather data, in this case that means running the application multiple times with different numbers of CPU cores and recording the execution time. Once we’ve done that we need to create a visualization of the data to see how it compares against the ideal case.
From the visualization we can then decide at what scale it makes most sense to run the application at in production to maximize the use of our CPU allocation on the system.
We could do all of this manually, but there are useful tools to help us manage data analysis pipelines like we have in our experiment. Today we’ll learn about one of those: Maestro.
In order to get started with Maestro, let’s begin by taking a simple
command and see how we can run that via Maestro. Let’s choose the
command hostname
which prints out the name of the host
where the command is executed:
OUTPUT
pascal83
That prints out the result but Maestro relies on files to know the status of your workflow, so let’s redirect the output to a file:
Writing a Maestro YAML
Edit a new text file named hostname.yaml
. The file
extension is a recursive initialism for “YAML
Ain’t Markup Language”, a popular format for configuration files and
key-value data serialization. For more, see the Wikipedia page, esp. YAML Syntax.
Contents of hostname.yaml
(spaces matter!):
YML
description:
name: Hostnames
description: Report a node's hostname.
study:
- name: hostname-login
description: Write the login node's hostname to a file.
run:
cmd: |
hostname > hostname_login.txt
Key points about this file
- The name of
hostname.yaml
is not very important; it gives us information about file contents and type, but maestro will behave the same if you rename it tohostname
orfoo.txt
. - The file specifies fields in a hierarchy. For example,
name
,description
, andrun
are all passed tostudy
and are at the same level in the hierarchy.description
andstudy
are both at the top level in the hierarchy. - Indentation indicates the hierarchy and should be consistent. For
example, all the fields passed directly to
study
are indented relative tostudy
and their indentation is all the same. - The commands executed during the study are given under
cmd
. Starting this entry with|
and a newline character allows us to specify multiple commands. - The example YAML file above is pretty minimal; all fields shown are required.
- The names given to
study
can include letters, numbers, and special characters.
Back in the shell we’ll run our new rule. At this point, we may see an error if a required field is missing or if our indentation is inconsistent.
bash: maestro: command not found...
If your shell tells you that it cannot find the command
maestro
then we need to make the software available
somehow. In our case, this means activating the python virtual
environment where maestro is installed.
You can tell this command has already been run when
(maestro_venv)
appears before your command prompt:
BASH
janeh@pascal83:~$ source /usr/global/docs/training/janeh/maestro_venv/bin/activate
(maestro_venv) janeh@pascal83:~$
Now that the maestro_venv
virtual environment has been
activated, the maestro
command should be available, but
let’s double check
OUTPUT
/usr/global/docs/training/janeh/maestro_venv/bin/maestro
Running maestro
Once you have maestro
available to you, run
maestro run hostname.yaml
and enter y
when
prompted:
OUTPUT
[2024-03-20 15:39:34: INFO] INFO Logging Level -- Enabled
[2024-03-20 15:39:34: WARNING] WARNING Logging Level -- Enabled
[2024-03-20 15:39:34: CRITICAL] CRITICAL Logging Level -- Enabled
[2024-03-20 15:39:34: INFO] Loading specification -- path = hostname.yaml
[2024-03-20 15:39:34: INFO] Directory does not exist. Creating directories to ~/Hostnames_20240320-153934/logs
[2024-03-20 15:39:34: INFO] Adding step 'hostname-login' to study 'Hostnames'...
[2024-03-20 15:39:34: INFO]
------------------------------------------
Submission attempts = 1
Submission restart limit = 1
Submission throttle limit = 0
Use temporary directory = False
Hash workspaces = False
Dry run enabled = False
Output path = ~/Hostnames_20240320-153934
------------------------------------------
Would you like to launch the study? [yn] y
Study launched successfully.
and look at the outputs. You should have a new directory whose name
includes a date and timestamp and that starts with the name
given under description
at the top of
hostname.yaml
.
In that directory will be a subdirectory for every study
run from hostname.yaml
. The subdirectories for each study
include all output files for that study.
BASH
(maestro_venv) janeh@pascal83:~$ cd Hostnames_20240320-153934/
(maestro_venv) janeh@pascal83:~/Hostnames_20240320-153934$ ls
OUTPUT
batch.info Hostnames.pkl Hostnames.txt logs status.csv
hostname-login Hostnames.study.pkl hostname.yaml meta
BASH
(maestro_venv) janeh@pascal83:~/Hostnames_20240320-153934$ cd hostname-login/
(maestro_venv) janeh@pascal83:~/Hostnames_20240320-153934/hostname-login$ ls
OUTPUT
hostname-login.2284862.err hostname-login.2284862.out hostname-login.sh hostname_login.txt
Challenge
To which file will the login node’s hostname, pascal83
,
be written?
hostname-login.2284862.err
hostname-login.2284862.out
hostname-login.sh
hostname_login.txt
- hostname_login.txt
In the original hostname.yaml
file that we ran, we
specified that hostname would be written to
hostname_login.txt
, and this is where we’ll see that
output, if the run worked!
Challenge
This one is tricky! In the example above, pascal83
was
written to
~/Hostnames_{date}_{time}/hostname-login/hostname_login.txt
.
Where would Hello
be written for the following YAML?
YML
description:
name: MyHello
description: Report a node's hostname.
study:
- name: give-salutation
description: Write the login node's hostname to a file
run:
cmd: |
echo "hello" > greeting.txt
~/give-salutation_{date}_{time}/greeting/greeting.txt
~/greeting_{date}_{time}/give_salutation/greeting.txt
~/MyHello_{date}_{time}/give-salutation/greeting.txt
~/MyHello_{date}_{time}/greeting/greeting.txt
.../MyHello_{date}_{time}/give-salutation/greeting.txt
The top-level folder created starts with the name
field
under description
; here, that’s MyHello
. Its
subdirectory is named after the study
; here, that’s
give-salutation
. The file created is
greeting.txt
and this stores the output of
echo "hello"
.
Callout
After running a workflow with Maestro, you can check the status via
maestro status --disable-theme <directory name>
. For
example, for the directory Hostnames_20240821-165341
created via maestro run hostnames.yaml
:
OUTPUT
Study: /usr/WS1/janeh/maestro-tut/Hostnames_20240821-165341
┏━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ ┃ ┃ ┃ ┃ ┃ Elapsed ┃ ┃ Submit ┃ ┃ Number ┃
┃ Step Name ┃ Job ID ┃ Workspace ┃ State ┃ Run Time ┃ Time ┃ Start Time ┃ Time ┃ End Time ┃ Restarts ┃
┡━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ hostname-l │ 2593210 │ hostname- │ FINISHED │ 0d:00h:00m │ 0d:00h:00 │ 2024-08-21 │ 2024-08-2 │ 2024-08-21 │ 0 │
│ ogin │ │ login │ │ :01s │ m:01s │ 16:53:44 │ 1 │ 16:53:45 │ │
│ │ │ │ │ │ │ │ 16:53:44 │ │ │
└────────────┴─────────┴───────────┴──────────┴────────────┴───────────┴────────────┴───────────┴────────────┴───────────┘
(END)
Key Points
- You execute
maestro run
with a YAML file including information about your run. - Your run includes a description and at least one study (a step in your run).
- Your maestro run creates a directory with subdirectories and outputs for each study.
- Check the status of a run via
maestro status --disable-theme <directory>