Jobs in Kubernetes
Last updated on 2025-01-07 | Edit this page
Overview
Questions
- What are Jobs and how can I use it for my workflows?
Objectives
- Understand what a Job is.
- Explore the usage of Jobs to run computational workflows.
NRP Example: https://docs.nationalresearchplatform.org/userdocs/running/jobs
What are Jobs?
Kubernetes Jobs are used to run tasks to completion, such as the end of a specific step in workflow or the complete end of a workflow. In the YAML file, Jobs and Pods appear very similar, but have a few extra pieces. This is a result of Jobs being a high-level abstration that manages a Pod to make sure they run to completition, retry a set amount of times until they comlpete succesfully or run out of attempts. Jobs can also be set to run multiple Pods of the same Pod at the same time. Jobs are also good for one-off or scheduled tasks such as data updates or backups and can be set to run on a schedule as a Cronjob.
This makes pods very advantageous for computational scientific workflows where each step may only need to run once for each step, or can split out and run the same part across multiple files. Jobs also enable easier monitoring of the current workflow stage since you only need to specify the Job rather than multiple pods.
The file structure for a Job is fairly similar to a Pod’s yaml file.
pi-example.yaml
YAML
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
template:
spec:
containers:
- name: pi
image: perl:5.34.0
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
backoffLimit: 4
Example from https://kubernetes.io/docs/concepts/workloads/controllers/job/
The example above will run a perl container that compute Pi to 2000 digits. The backoff Limit also prevents a Job from running continously in a crash loop if any of the Pods spawned by the Job fail. In the example above, if more than 4 Pods fail, it will cancel the Job.
The example may take up to 2 minutes to complete.
OUTPUT
job.batch/pi created
After 2 minutes, check the logs.
OUTPUT
3.141592653589793238462643383279502...
What are their benefits for research computing?
By using a Job, you are able to run tasks and workflows in a manner similar to traditional batch computing clusters. Unlike a Pod, a Job will run until a task completes up to a certain number of re-runs. A Pod does not have this function and would continously restart or follow the global restart policy. On a small local setup, this may not be a large challenge, but becomes critical when using a campus or regional Kubernetes cluster. They also allow more flexibility in parralellization through the ability to have one Job spawn many Pods of the same type with different names.