Reproducible Computational Environments Using Containers: Introduction to Kubernetes: All in One View

Last updated on 2025-01-07 | Edit this page

Estimated time: 12 minutes

Overview

Questions

What is Kubernetes and why might I want to use it?

Objectives

Understand what Kubernetes is and when you might want to utilize it.
Utilize Kubernetes commands to interact with a Kubernetes environment.

Kubernetes, or “k8s” is an open source container orchestration platform to automate deploying, scaling, and managing containers within an environment. Traditionally, Kubernetes is aimed towards commercial environments and cloud infrastrutuce. Originally developed by Google and realeased open source and is now managed by the Cloud Native Computing Foundation or CNCF.

Discussion

Discuss with your neighbor

What are some challenges you may have when you move from one computational environment to another for your research? For example moving from a local compute cluster to a different compute cluster.

Some examples you may have discussed:

Different methods of interacting with the environment
Keywords could be different or processed differently
Software may not be available readily

Kubernetes provides a uniform platform and method across many different local and commericial cloud infrastructure. This ensures that your application is able to operate the same from your local developmment system to a production Kubernetes environment or a commercial cloud environment. This also helps ensure that a research workflow using Kubernetes is able to be reproduced on any infrastructure using Kubernetes.

Kubernetes deployments and environments are managed using YAML files. These YAML files allow you to tell the Kubernetes cluster what you want from a specific environment for the cluster then to attempt to make that defined configuration. This also increases the ability to change things in a recorded manner using version control tools like Git.

Callout

Make sure everything is ready

Before proceeding, double check that you have kubectl connected to a Kubernetes instance. kubectl get nodes will retreive all of the computers or nodes running in the Kubernetes environment.

BASH

$ kubectl get nodes

You should see output similar to this:

OUTPUT

NAME       STATUS   ROLES           AGE   VERSION
minikube   Ready    control-plane   90d   v1.30.0

Callout

Double check things are good to go

Lets also double check by looking at the available namespaces. We will talk more about what these are later.

BASH

$ kubectl get namespaces

You should see output similar to this:

OUTPUT

NAME                   STATUS   AGE
default                Active   90d
kube-node-lease        Active   90d

The kubectl command does not directly control the state of a Kubernetes cluster. Instead it communicates directly to the Kubernetes’ controller through an API server. When you run kubectl apply or kubectl delete, the command execution doesn’t directly change the environment. Instead it tells the controller through the API server an expected state of the environment and the Kubernetes controller will attempt to make the declared state the current state and keep the cluster in the desired state. This is called “declarative management”, which simplifies the management of more complex systems.

Discussion

Talk about it

With docker or other container platforms, what challenges may come up as you start scaling up and running more workflows?

Some challenges you may have thought of:

Needing to micromanage everything running as projects scale
Many working components needing configured

Content from Running Pods in Kubernetes

Last updated on 2024-12-03 | Edit this page

Estimated time: 12 minutes

Overview

Questions

What is Kubernetes and why might I want to use it?

Objectives

Understand what Kubernetes is and when you might want to utilize it.

Pods are the fundamental building block of a Kubernetes setup and are designed to run a single portion of an application or process.

Each pod has its own identifiers and specifications and containers. With Kubernetes, the individual container or containers in a pod are not managed by kubectl. Instead the pods are the lowest level of a Kubernetes object that is managed with kubectl for applications.

Pods are designed around the idea of running a single portion of a larger system. Similar to each team in a company working on their own tasks. Sometimes these teams only have one person, or they may have multiple people. Pods have the ability to run with a single container or with multiple containers. If a pod is running multiple containers, they should be tightly coupled together. An example would be a pod running a simple website that is based on a Github repository. One container would be running the webserver itself and another would be pulling the repository updates.

Pods running with mutliple containers are able to also share storage volumes, which allows each of the containers to work on the same location in a straight forward manner.

Where do container images come from?

Kuberenetes pulls container images from the same sources that Docker would. This can be from the local system, DockerHub, or another remote registry service. Any images that are not already available. The syntax is also the same as what you would use for Docker.

Example formats

hello-world - If you use the image name only, it will pull the latest tag from DockerHub
hello-world:1.2.3 - You can also specify a tag to select a specific version or build of an image. This should be specified when possible to help increase the reproducability of your workflows.
registry.example.com/hello-world:1.2.3 - This would pull a specific image and tag from a remmote registry. This could be a public registry or an insitutional registry.

Exploring Pods

We are going to create a basic “hello world” pod. Below is a YAML file containing all of the information that is needed for the API controller to spawn our pod and its container.

hello-world.yaml

YAML

apiVersion: v1
kind: Pod
metadata:
  name: hello-world-pod
spec:
  containers:
  - name: hello-world-container
    image: busybox
    command: ["/bin/sh", "-c"]
    args: ["echo 'Hello World! I am in a pod!' && sleep infinity"]

What do the lines mean?

When we apply or send the YAML file to the api Controller, the file’s contents tell the controller what the desired state of your configuration is. The apiVersion line states what version of Kubernetes API the file is going to use. The kind line specifies what type of object we are creating with this block of config The metadata section contains basic information about the Pod we are creating. In this case we are only specifying the name of the pod. spec is the specifications of what we want the desired state of the pod to me. This file contains a basic single container Pod’s specification. The specification is asking for one container named hello-world-container using the busybox image.
The command is what command we want the container to run, similar to changing the entrypoint using docker run and the args are the commands we will be running inside the hello-world-container container within our hello-world Pod.

We can send this configuration to the API Controller using the kubectl apply command.

BASH

kubectl apply -f hello-world.yaml

OUTPUT

pod/hello-world-pod created

We can check the status of our Pod using kubectl get

BASH

kubectl get pods

OUTPUT

NAME              READY   STATUS              RESTARTS   AGE
hello-world-pod   0/1     ContainerCreating   0          9s

The output indicates that our single pod is not ready for use and is currently creating all of the specified containers. Since we are only using one container, this process should be fairly quick.

If we check again shortly after, we can see the Pod is ready and running.

OUTPUT

NAME              READY   STATUS    RESTARTS   AGE
hello-world-pod   1/1     Running   0          5s

Callout

Getting more details

We can get more details about the Pod including the creation process and progress today using kubectl describe pod name-of-pod

The kubectl describe provides a verbose description of different Kubernetes resources. You do need to specify both the type or kind of object and the name of the object itself that you want information on.

Show details

BASH

kubectl describe pod hello-world-pod

OUTPUT

...
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  18m   default-scheduler  Successfully assigned default/hello-world-pod to minikube
  Normal  Pulling    18m   kubelet            Pulling image "busybox"
  Normal  Pulled     18m   kubelet            Successfully pulled image "busybox" in 1.638s (1.638s including waiting). Image size: 4261502 bytes.
  Normal  Created    18m   kubelet            Created container hello-world-container
  Normal  Started    18m   kubelet            Started container hello-world-container

We can check if our Pod has run or is running by checking the logs using kubectl logs name-of-pod.

BASH

kubectl logs hello-world-pod

OUTPUT

Hello World! I am in a pod!

Running commands in Pods

During development and debugging of environments or verifying a quick command, it may be beneficial to get an interactive terminal within a pod or run a command within the pod.

We can pull up an interactive terminal in a pod using kubectl’s exec command.

BASH

kubectl exec -it hello-world-pod -- /bin/sh

OUTPUT

/ #

To get out of the interactive session, run the exit command.

Discussion

Explore the pod

Run some commands within the pod to explore its environment. Some questions that can be answered are:

What is the hostname of the Pod?
What user are we running as within the Pod?

Individual commands can also be executed from within the Pod without the need for an interactive session.

BASH

kubectl exec -it hello-world-pod -- /bin/sh -c "hostname"

OUTPUT

hello-world-pod

Cleaning up

We can clean up our pods from this lesson by deleting them using kubectl’s delete command. To do this, we can either run the delete command similar to the apply command by specifying the file, or we can manually delete it by calling its name.

By specifying the filename, anything that is defined in the file will get removed.

BASH

kubectl delete -f hello-world.yaml

OUTPUT

pod "hello-world-pod" deleted

We can also run the same operation by specifying the resource type and resource name of the pod we are deleting.

BASH

kubectl delete pod hello-world-pod

OUTPUT

pod "hello-world-pod" deleted

When we use the file itself to delete the Pod, the definitions provided in the file automatically handle the resource name and type, while without the file, that information is needed in order for the Kubernetes controller to make an operation.

Modifying the Pod

While a pod is running, we can modify it’s definitions and the API controllers will handle the transition for us.

If we switch the container definition in the Pod yaml file from busybox to another container, such as ubuntu:23.04, we can migrate our Pod.

After making the change, we can re-apply the definition file.

BASH

kubectl apply -f hello-world.yaml

OUTPUT

pod/hello-world-pod configured

The language has changed when we applied the new definition file to our Pod. When we made our Pod for the first time, it listed the Pod as “created”. In this instance the Pod was “configured” and not “created”.

We can also look at the actions that the Controller took to update our pod using the describe command.

BASH

kubectl describe pod hello-world-pod

OUTPUT

...
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  40s   default-scheduler  Successfully assigned default/hello-world-pod to minikube
  Normal  Pulling    39s   kubelet            Pulling image "busybox"
  Normal  Pulled     38s   kubelet            Successfully pulled image "busybox" in 718ms (718ms including waiting). Image size: 4269694 bytes.
  Normal  Created    37s   kubelet            Created container hello-world-container
  Normal  Started    37s   kubelet            Started container hello-world-container
  Normal  Killing    18s   kubelet            Container hello-world-container definition changed, will be restarted
  Normal  Pulling    4m53s                  kubelet            Pulling image "ubuntu:23.04"
  Normal  Pulled     4m50s                  kubelet            Successfully pulled image "ubuntu:23.04" in 3.41s (3.41s including waiting). Image size: 70323206 bytes.
  Normal  Created    4m45s (x2 over 5m43s)  kubelet            Created container hello-world-container
  Normal  Started    4m45s (x2 over 5m43s)  kubelet            Started container hello-world-container

Here we can see that the Controller noticed the definition change, then worked to kill the existing Pod to remake it with the new definition.

We can also verify that the container has changed by checking its operating system with cat /etc/os-release

BASH

kubectl exec -it hello-world-pod -- /bin/sh -c "cat /etc/os-release"

OUTPUT

PRETTY_NAME="Ubuntu 23.04"
NAME="Ubuntu"
VERSION_ID="23.04"
...

We could achieve a similar result by deleting the Pod, making the changes, then reapplying the definition, however this has us doing the effort of managing our Pods, when instead we can let Kubernetes handle it.

Callout

Clean up after the lesson

Go ahead and check for any remaining Pods and kill them before moving on.

Content from Volumes in Kubernetes

Last updated on 2024-12-03 | Edit this page

Estimated time: 12 minutes

Overview

Questions

How do I make data available in a Pod?

Objectives

Explore creating and using volumes
Understand the “stateless” nature of pods

Pods are stateless entities and should be treated as such. Anything the pod creates in it’s filesystem is removed when the pod stops or dies. For temporary files and directories this is fine. However this is not condusive for research since results need to be saved somewhere. Pods and their containers are in a ephemeral or “stateless” configuration by default. This means that any data stored on the filesystems within the containers are lost when the pod is stopped. For workflows, this creates a small barrier. Workflows can still be run using this methodology, however the pods and their containers would need to be setup to download their input and upload their output to a remote server.

Volumes are the solution to this problem. The main type of volume used for pods is the Persistent Volume Claim or “PVC”. These are requests to the Kubernetes cluster to “claim” space on the storage system or a Persistent Volume (PV) of a cluster. PVCs are scoped to different namespaces, which means that a PVC is not visible or accessible by another namespace on the Kubernetes cluster.

Once a PVC is mounted in a pod, data can be stored or retrieved from using the mount path on the container’s filesystem. This allows data to stay persistent after the pod is terminated.

Creating a PV

Some setups may do this automatically

A Kubernetes cluster may automatically create a PV when a PVC is created. This will be mentioned in any documentation of the cluster itself.

First we need to create a Persistent Volume to give us a space to claim for files.

pv_create.yaml

YAML

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv0001
spec:
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 5Gi
  hostPath:
    path: /mnt/pv0001/

In our pv_create.yaml, we defined the basic details on our Persistent Volume. We are using a generic nam, pv0001, for our Persistent Volume, but it can be any name if desired. Since we are only running a single pod in this part of the lesson, we are using the ReadWriteOnce access mode, which allows multiple pods to access the volume if they are running on the same node. There are other access modes available that allow working across multiple nodes or setting it to read only. More details on this are available in the Kubernetes documentation. We also configured the Persistent Volume to have a capcity of 5 GB of space and have it on the host filesystem at /mnt/pv0001.

BASH

kubectl apply -f pv_create.yaml

OUTPUT

persistentvolume/pv0001 created

This will create a dedicated space for our Pods to store files. The Pods won’t have immediate access to store data in a PV. In order for the Pods to store data, they need to use a claim against the PV using a Persistent Volume Claim.

Creating a PVC

For our Pods to store data, they need to claim space in a Persistent Volume.

pvc_storage.yaml

YAML

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-test-pv-claim
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi

This configuration will look for a Persistent Volume to claim against if there is a mathcing and available PV.

If there is an available and matching PV, a claim will then be available for a Pod or multiple Pods to use.

BASH

kubectl apply -f pvc_storage.yaml

OUTPUT

persistentvolumeclaim/my-test-pv-claim created

This created a PVC and dynamically made a PV since a PV did not yet exist.

We can check both by using kubectl get pv and kubectl get pvc.

BASH

kubectl get pv

OUTPUT

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                          STORAGECLASS   VOLUMEATTRIBUTESCLASS   REASON   AGE
pvc-7c06f9ef-909a-4cd7-b450-d136219a8964   8Gi        RWO            Delete           Bound    openproject/data-my-openproject-postgresql-0   standard       <unset>                          83d

BASH

kubectl get pvc

OUTPUT

NAME               STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
my-test-pv-claim   Pending                                      manual         <unset>                 82s

Transfering data to and from a PVC

In order to transfer data to and from a PVC, you need to have the volume mounted in a pod for the transfer. Data from a local computer can then be copied to a PVC through the Pod using the mount path of the PVC.

First we will need to make a pod to copy our data through. We will use a similar structure as our first pod.

YAML

apiVersion: v1
kind: Pod
metadata:
  name: hello-world-pod
spec:
  containers:
  - name: hello-world-container
    image: busybox
    command: ["/bin/sh", "-c"]
    args: ["echo 'Hello World! I am in a pod!' && sleep infinity"]
    volumeMounts:
    - mountPath: /mnt/my_pvc
      name: my-pvc-for-pod
  volumes:
    - name: my-pvc-for-pod
      persistentVolumeClaim:
        claimName: my-test-pv-claim

We can confirm if our mountPath has our PVC available by running ls in the pod.

BASH

kubectl exec hello-world-pod -- /bin/sh -c "ls -l /mnt"

OUTPUT

total 4
drwxr-xr-x    2 root     root          4096 Oct  7 16:27 my_pvc

Since our Pod is running the sleep command indefinitely, the pod will remain active and we can use it to transfer some data.

We will create a basic file to place in the PVC. In my_file.md we will insert the following text. Hello, I am a file in a PVC!

BASH

nano my_file.md
cat my_file.md

OUTPUT

Hello, I am a file in a PVC!

BASH

kubectl cp my_file.md hello-world-pod:/mnt/my_pvc/

If we check the content of the PVC mounted in the Pod, we can see our file and look at it’s contents.

BASH

kubectl exec hello-world-pod -- /bin/sh -c "ls -l /mnt/my_pvc"

OUTPUT

total 4
-rw-rw-r--    1 1000     1000            29 Oct  7 16:49 my_file.md

BASH

kubectl exec hello-world-pod -- /bin/sh -c "cat /mnt/my_pvc/my_file.md"

OUTPUT

Hello, I am a file in a PVC!

Verifying the data is persistent

We can verify that the data is persistent by deleting the pod and creating a new pod.

BASH

kubectl delete -f pvc_bind.yaml

OUTPUT

pod "hello-world-pod" deleted

At this point any data in the Pod’s filesystem itself would be gone as the Pod’s filesystem is ephemeral.

We will create a new Pod that will output the contents of the file.

check_pvc.yaml

YAML

apiVersion: v1
kind: Pod
metadata:
  name: check-pvc-pod
spec:
  containers:
  - name: file-check-container
    image: busybox
    command: ["/bin/sh", "-c"]
    args: ["cat /mnt/my_pvc/my_file.md && sleep infinity"]
    volumeMounts:
    - mountPath: /mnt/my_pvc
      name: my-pvc-for-pod
  volumes:
    - name: my-pvc-for-pod
      persistentVolumeClaim:
        claimName: my-test-pv-claim

BASH

kubectl apply -f pvc_bind.yaml

OUTPUT

pod/check-pvc-pod created

We can then check the logs of the pod to see the contents of the file.

BASH

kubectl logs check-pvc-pod

OUTPUT

Hello, I am a file in a PVC!

We can also modify create a file in the volume using a Pod.

pod_create_file.yaml

YAML

apiVersion: v1
kind: Pod
metadata:
  name: file-create-pod
spec:
  containers:
  - name: file-create-container
    image: busybox
    command: ["/bin/sh", "-c"]
    args: ["for i in 1 2 3 4 5; do cat /mnt/my_pvc/my_file.md >> /mnt/my_pvc/output.log; done; ls /mnt/my_pvc; cat /mnt/my_pvc/output.log && sleep infinity"]
    volumeMounts:
    - mountPath: /mnt/my_pvc
      name: my-pvc-for-pod
  volumes:
    - name: my-pvc-for-pod
      persistentVolumeClaim:
        claimName: my-test-pv-claim

What we should expect to see is the contents of our PVC mounted in the pod and the contents of output.log

BASH

kubectl logs file-create-pod

OUTPUT

my_file.md
output.log
Hello, I am a file in a PVC!
Hello, I am a file in a PVC!
Hello, I am a file in a PVC!
Hello, I am a file in a PVC!
Hello, I am a file in a PVC!

We can then copy the output.log file back to our computer.

BASH

kubectl cp file-create-pod:mnt/my_pvc/output.log output.log
cat output.log

OUTPUT

Hello, I am a file in a PVC!
Hello, I am a file in a PVC!
Hello, I am a file in a PVC!
Hello, I am a file in a PVC!
Hello, I am a file in a PVC!

Content from Jobs in Kubernetes

Last updated on 2025-01-07 | Edit this page

Estimated time: 12 minutes

Overview

Questions

What are Jobs and how can I use it for my workflows?

Objectives

Understand what a Job is.
Explore the usage of Jobs to run computational workflows.

NRP Example: https://docs.nationalresearchplatform.org/userdocs/running/jobs

What are Jobs?

Kubernetes Jobs are used to run tasks to completion, such as the end of a specific step in workflow or the complete end of a workflow. In the YAML file, Jobs and Pods appear very similar, but have a few extra pieces. This is a result of Jobs being a high-level abstration that manages a Pod to make sure they run to completition, retry a set amount of times until they comlpete succesfully or run out of attempts. Jobs can also be set to run multiple Pods of the same Pod at the same time. Jobs are also good for one-off or scheduled tasks such as data updates or backups and can be set to run on a schedule as a Cronjob.

This makes pods very advantageous for computational scientific workflows where each step may only need to run once for each step, or can split out and run the same part across multiple files. Jobs also enable easier monitoring of the current workflow stage since you only need to specify the Job rather than multiple pods.

The file structure for a Job is fairly similar to a Pod’s yaml file.

pi-example.yaml

YAML

apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  template:
    spec:
      containers:
      - name: pi
        image: perl:5.34.0
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never
  backoffLimit: 4

Example from https://kubernetes.io/docs/concepts/workloads/controllers/job/

The example above will run a perl container that compute Pi to 2000 digits. The backoff Limit also prevents a Job from running continously in a crash loop if any of the Pods spawned by the Job fail. In the example above, if more than 4 Pods fail, it will cancel the Job.

The example may take up to 2 minutes to complete.

BASH

kubectl logs jobs/pi

OUTPUT

job.batch/pi created

After 2 minutes, check the logs.

BASH

kubectl logs jobs/pi

OUTPUT

3.141592653589793238462643383279502...

What are their benefits for research computing?

By using a Job, you are able to run tasks and workflows in a manner similar to traditional batch computing clusters. Unlike a Pod, a Job will run until a task completes up to a certain number of re-runs. A Pod does not have this function and would continously restart or follow the global restart policy. On a small local setup, this may not be a large challenge, but becomes critical when using a campus or regional Kubernetes cluster. They also allow more flexibility in parralellization through the ability to have one Job spawn many Pods of the same type with different names.

Content from Volumes in Kubernetes

Last updated on 2025-01-07 | Edit this page

Estimated time: 12 minutes

Overview

Questions

What are Kubernetes Services?
What can Kubernetes Services be used for?

Objectives

Understand what Services in Kubernetes are.

Services are a way to expose a deployment or pod to the network outside of a kubernetes cluster for a long term. When we create a Pod, it gets assigned a unique IP address that allows it to communicate with other pods. However, when a pod is restarted or deleted and recreated, it’s IP address will change, which makes communication amongst pods more complicated.

Services allow Pods to have more reliable communication by abstracting the networking layer between Pods and other Pods or the outside world. Examples of this could be having a internal database communicating with an application in another Pod doing analysis on the stored data. It could also be used to expose a science gateway to the public internet or a local campus network.