Process Automation
Last updated on 2022-05-16 | Edit this page
Overview
Questions
- How can I use GitLab to automate processes?
Objectives
- Setup a process to run automatically when a commit is pushed to a repository.
- Safely use secrets in a process automatically triggered by GitLab.
We have setup our project, collaborated with our colleagues; we fill our research diary every day. Now it is time to make use of it.
We think about using part of our daily commute by train to review what happened in the past. We always have our e-reader with us, so ideally we would like to have an version of our diary in EPUB format.
Just as much based in reality as the rest of this lesson’s story, we know how to do that, but we do not want to do it by hand. Instead, we are going to let GitLab automatically build us an EPUB file, whenever our repository changes. To achieve it, we will us the feature that GitLab files under “CI/CD”, which stands for Continuous Integration/Continuous Deployment.
Continuous Integration/Continuous Deployment
The terms continuous integration (CI) and continuous deployment (CD) come from the fields of software engineering and software operation.
A software project uses CI, if it has setup a process that automatically runs some (or all) tests for a software project, whenever changes are committed or even just proposed (for example through a merge request) to the main branch of a repository. In particular when already done for proposed changes this can help prevent bugs to reach the code base, while freeing the developer of thinking of and running all the tests themselves.
CD is a process that updates the software on the production machines (deploys) as soon as a new change reaches the code base, generally after having passed the CI process.
Both processes can be implemented by running scripts, called jobs, triggered either by changes to the code base or by the successful completion of other jobs, which lead to the CI/CD feature of GitLab.
Example Configuration
To configure our automatic process, we navigate to our project page and click the menu item labeled “CI/CD” in the main menu on the left.
The page that this leads to invites us to “use a sample
.gitlab-ci.yml
template file to explore how CI/CD works.”
The GitLab CI is configured by providing a file called
.gitlab-ci.yml
in the root directory of a project’s
repository. Even though we want to learn how CI works in GitLab, we do
not follow the invitation, because the example is quite elaborate and
targets software developers specifically. Instead, we will use the
provided template for Bash.
We click on the button labeled “Use template” of the entry for Bash in the list at the bottom of the page.
This leads us to an editor for the .gitlab-ci.yml
file
that is prefilled from the selected template. The file is expected to be
written in the YAML file format, hence
the file extension .yml
. We will go through the example
line by line. Afterwards we will adapt it to our needs.
The first few lines, starting with the symbol #
, are
comments notifying us of where to find further information. The last of
these lines is the exception, which is a comment on the first functional
line:
YAML
image: busybox:latest
This line states that the scripts that are provided later on should be executed in Docker containers build from the stated Docker image. Busybox is a very small Linux distribution, reduced to the bare minimum, and it provides a shell.
Now follow to blocks starting with before_script:
and
after_script:
. We will ignore those.
The remaining four blocks, build1:
, test1:
,
test2:
, and deploy1:
, each define a so called
job. A job defines a script and represents one non-divisible unit of a
CI/CD process. Together the jobs defined in a project’s
.gitlab-ci.yml
file form a so called pipeline.
By default a pipeline has three stages, called build
,
test
, and deploy
, and their order is
important. The terminology comes from software development again. A
pipeline is executed by running all jobs of the first stage and if they
succeed to continue onto the next stage, continuing until a job fails or
all jobs of the last stage succeeded.
The four jobs in the example are named similar to the stages they are
assigned to. The assignment is done through the keyword
stage:
that can always be found in the second line of a
job’s definition.
The lines following the keyword script:
in each section,
define the script that will be executed as part of the respective job.
In this example, they are all echo
statements.
Our Own Configuration
We will now adapt the script to our requirements.
First we delete everything.
Then we define our own stages, because we do not develop software and want to have descriptive names for our use case:
YAML
stages:
- check
- create
In the first stage, we will do some testing, in the second we will create an EPUB format version of our research diary.
We call the first job check-for-mds
and in it make sure,
that at least one Markdown file exists, because otherwise someone must
have accidentally removed them all.
YAML
check-for-mds:
image: bash:latest
stage: check
script:
- compgen -G '*.md'
We use the docker image bash
, because Bash provides the
compgen
command that allows us to check for the existence
of files. We also state that the job belongs to the stage
check
. Finally, we provide the script. It tests for the
existence of any files with names ending in .md
.
Whenever the job check-for-mds
successfully completes,
we want to create the EPUB format version of our research diary. We
define a second job and call it create-epub
.
YAML
create-epub:
image:
name: pandoc/core:latest
entrypoint: ["/bin/sh", "-c"]
stage: create
script:
- pandoc *.md -o diary.epub
artifacts:
paths:
- diary.epub
After giving the name of the job, we again provide an Docker image in
which the job should run. Since we want to use pandoc to convert our
Markdown files into an EPUB file, we use the official pandoc Docker
image. The next line, with the keyword entrypoint
is
necessary, because the pandoc Docker image is configured to directly run
pandoc, when started in a container (pandoc is configured as its
entrypoint). However, GitLab CI expects to run scripts in the Docker
container, thus expects the Docker images to start a shell.
Pandoc
Pandoc converts text documents from format to another, for example from Markdown to EPUB. It supports many formats for the source documents and even more for the target document.
The project webpage provides a complete list (in text and graphical form).
Next, we specify the stage, followed by the script. It runs pandoc on
all files ending in .md
in the current directory and
instructs it to output a file diary.epub
. Pandoc deduces
from the file extension that it should be in EPUB format.
The final three lines, starting with the keyword
artifacts
, specify that GitLab CI should save the file
diary.epub
from the Docker container the job runs in and
provide it for download on its web interface.
This completes our GitLab CI configuration.There are a lot more configuration options, for example to have a process run only for commits to certain branches, that are documented in GitLab’s Handbook
We change the commit message to ”Configure CI”, leave the branch as
it is (main
), and click the button labeled “Commit
changes”.
Clicking the button does not cause a change in page. Instead, we are
informed at the top of the page about the status of the pipeline run
that was initiated by committing the .gitlab-ci.yml
to the
repository.
To get a better overview, we select “CI/CD” from the main menu on the right side. This brings us to the list of pipeline runs. We should see exactly one entry in the list, as we have just configured the CI.
A blue, red, or green box in the “Status” column informs on the left informs us about the state of the pipeline. Blue indicates a running pipeline, red a failed pipeline, and green a pipeline that successfully completed. Below the status box we have the runtime of the pipeline and below that information about when it was started.
In the ”Pipeline” column, we see the first line of the commit that
triggered the pipeline run, the number of the pipeline run, for example
#11071
, the branch the commit is part of as well as the
short identifier of the commit. Most of these serve as links to pages
for their respective entities.
The ”Triggerer” column tells us who caused the pipeline run. In our case that’s the author of the commit.
The ”Stages” column visualizes the stages and their state. In addition to the colors discussed above, there are stages colored in grey. They are waiting for their predecessor stages (or need to be manually triggered, if configured that way).
We wait until our pipeline ran through. It should complete without error.
We click on the button labeled by the three dots on the right of the
entry. In the menu that opens, we see that it provides a link to
download artifacts of the pipeline. We click on the link labeled
“create-epub:archive”, which causes our browser to download a file
called artifacts.zip
.
In that archive, we find the file diary.epub
that was
build by our second job, create-epub
.
If you have an ebook viewer, you can verify that it contains the contents of all Markdown our files. (We ignore the fact, that the order might not make sense at all. For that we would need to improve the script that builds the EPUB file.)