Running commands with Maestro

Last updated on 2024-08-22 | Edit this page

Overview

Questions

  • “How do I run a simple command with Maestro?”

Objectives

  • “Create a Maestro YAML file”

What is the workflow I’m interested in?


In this lesson we will make an experiment that takes an application which runs in parallel and investigate it’s scalability. To do that we will need to gather data, in this case that means running the application multiple times with different numbers of CPU cores and recording the execution time. Once we’ve done that we need to create a visualization of the data to see how it compares against the ideal case.

From the visualization we can then decide at what scale it makes most sense to run the application at in production to maximize the use of our CPU allocation on the system.

We could do all of this manually, but there are useful tools to help us manage data analysis pipelines like we have in our experiment. Today we’ll learn about one of those: Maestro.

In order to get started with Maestro, let’s begin by taking a simple command and see how we can run that via Maestro. Let’s choose the command hostname which prints out the name of the host where the command is executed:

BASH

hostname

OUTPUT

pascal83

That prints out the result but Maestro relies on files to know the status of your workflow, so let’s redirect the output to a file:

BASH

janeh@pascal83:~$ hostname > hostname_login.txt

Writing a Maestro YAML


Edit a new text file named hostname.yaml. The file extension is a recursive initialism for “YAML Ain’t Markup Language”, a popular format for configuration files and key-value data serialization. For more, see the Wikipedia page, esp. YAML Syntax.

Contents of hostname.yaml (spaces matter!):

YML

description:
  name: Hostnames
  description: Report a node's hostname.

study:
  - name: hostname-login
    description: Write the login node's hostname to a file.
    run:
      cmd: |
        hostname > hostname_login.txt

Key points about this file

  1. The name of hostname.yaml is not very important; it gives us information about file contents and type, but maestro will behave the same if you rename it to hostname or foo.txt.
  2. The file specifies fields in a hierarchy. For example, name, description, and run are all passed to study and are at the same level in the hierarchy. description and study are both at the top level in the hierarchy.
  3. Indentation indicates the hierarchy and should be consistent. For example, all the fields passed directly to study are indented relative to study and their indentation is all the same.
  4. The commands executed during the study are given under cmd. Starting this entry with | and a newline character allows us to specify multiple commands.
  5. The example YAML file above is pretty minimal; all fields shown are required.
  6. The names given to study can include letters, numbers, and special characters.

Back in the shell we’ll run our new rule. At this point, we may see an error if a required field is missing or if our indentation is inconsistent.

BASH

janeh@pascal83:~$ maestro run hostname.yaml

bash: maestro: command not found...

If your shell tells you that it cannot find the command maestro then we need to make the software available somehow. In our case, this means activating the python virtual environment where maestro is installed.

BASH

source /usr/global/docs/training/janeh/maestro_venv/bin/activate

You can tell this command has already been run when (maestro_venv) appears before your command prompt:

BASH

janeh@pascal83:~$ source /usr/global/docs/training/janeh/maestro_venv/bin/activate
(maestro_venv) janeh@pascal83:~$

Now that the maestro_venv virtual environment has been activated, the maestro command should be available, but let’s double check

BASH

(maestro_venv) janeh@pascal83:~$ which maestro

OUTPUT

/usr/global/docs/training/janeh/maestro_venv/bin/maestro

Running maestro


Once you have maestro available to you, run maestro run hostname.yaml and enter y when prompted:

BASH

(maestro_venv) janeh@pascal83:~$ maestro run hostname.yaml

OUTPUT

[2024-03-20 15:39:34: INFO] INFO Logging Level -- Enabled
[2024-03-20 15:39:34: WARNING] WARNING Logging Level -- Enabled
[2024-03-20 15:39:34: CRITICAL] CRITICAL Logging Level -- Enabled
[2024-03-20 15:39:34: INFO] Loading specification -- path = hostname.yaml
[2024-03-20 15:39:34: INFO] Directory does not exist. Creating directories to ~/Hostnames_20240320-153934/logs
[2024-03-20 15:39:34: INFO] Adding step 'hostname-login' to study 'Hostnames'...
[2024-03-20 15:39:34: INFO]
------------------------------------------
Submission attempts =       1
Submission restart limit =  1
Submission throttle limit = 0
Use temporary directory =   False
Hash workspaces =           False
Dry run enabled =           False
Output path =               ~/Hostnames_20240320-153934
------------------------------------------
Would you like to launch the study? [yn] y
Study launched successfully.

and look at the outputs. You should have a new directory whose name includes a date and timestamp and that starts with the name given under description at the top of hostname.yaml.

In that directory will be a subdirectory for every study run from hostname.yaml. The subdirectories for each study include all output files for that study.

BASH

(maestro_venv) janeh@pascal83:~$ cd Hostnames_20240320-153934/
(maestro_venv) janeh@pascal83:~/Hostnames_20240320-153934$ ls

OUTPUT

batch.info      Hostnames.pkl        Hostnames.txt  logs  status.csv
hostname-login  Hostnames.study.pkl  hostname.yaml  meta

BASH

(maestro_venv) janeh@pascal83:~/Hostnames_20240320-153934$ cd hostname-login/
(maestro_venv) janeh@pascal83:~/Hostnames_20240320-153934/hostname-login$ ls

OUTPUT

hostname-login.2284862.err  hostname-login.2284862.out  hostname-login.sh  hostname_login.txt

Challenge

To which file will the login node’s hostname, pascal83, be written?

  1. hostname-login.2284862.err
  2. hostname-login.2284862.out
  3. hostname-login.sh
  4. hostname_login.txt
  1. hostname_login.txt

In the original hostname.yaml file that we ran, we specified that hostname would be written to hostname_login.txt, and this is where we’ll see that output, if the run worked!

Challenge

This one is tricky! In the example above, pascal83 was written to ~/Hostnames_{date}_{time}/hostname-login/hostname_login.txt.

Where would Hello be written for the following YAML?

YML

description:
    name: MyHello
    description: Report a node's hostname.

study:
    - name: give-salutation
      description: Write the login node's hostname to a file
      run:
          cmd: |
            echo "hello" > greeting.txt
  1. ~/give-salutation_{date}_{time}/greeting/greeting.txt
  2. ~/greeting_{date}_{time}/give_salutation/greeting.txt
  3. ~/MyHello_{date}_{time}/give-salutation/greeting.txt
  4. ~/MyHello_{date}_{time}/greeting/greeting.txt
  1. .../MyHello_{date}_{time}/give-salutation/greeting.txt

The top-level folder created starts with the name field under description; here, that’s MyHello. Its subdirectory is named after the study; here, that’s give-salutation. The file created is greeting.txt and this stores the output of echo "hello".

Callout

After running a workflow with Maestro, you can check the status via maestro status --disable-theme <directory name>. For example, for the directory Hostnames_20240821-165341 created via maestro run hostnames.yaml:

BASH

maestro status --disable-theme Hostnames_20240821-165341

OUTPUT

                               Study: /usr/WS1/janeh/maestro-tut/Hostnames_20240821-165341
┏━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃            ┃         ┃           ┃          ┃            ┃ Elapsed   ┃            ┃ Submit    ┃            ┃ Number    ┃
┃ Step Name  ┃ Job ID  ┃ Workspace ┃ State    ┃ Run Time   ┃ Time      ┃ Start Time ┃ Time      ┃ End Time   ┃ Restarts  ┃
┡━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ hostname-l │ 2593210 │ hostname- │ FINISHED │ 0d:00h:00m │ 0d:00h:00 │ 2024-08-21 │ 2024-08-2 │ 2024-08-21 │ 0         │
│ ogin       │         │ login     │          │ :01s       │ m:01s     │ 16:53:44   │ 1         │ 16:53:45   │           │
│            │         │           │          │            │           │            │ 16:53:44  │            │           │
└────────────┴─────────┴───────────┴──────────┴────────────┴───────────┴────────────┴───────────┴────────────┴───────────┘
(END)

Key Points

  • You execute maestro run with a YAML file including information about your run.
  • Your run includes a description and at least one study (a step in your run).
  • Your maestro run creates a directory with subdirectories and outputs for each study.
  • Check the status of a run via maestro status --disable-theme <directory>