Content from What is Kubernetes?
Last updated on 2025-01-07 | Edit this page
Estimated time: 12 minutes
Overview
Questions
- What is Kubernetes and why might I want to use it?
Objectives
- Understand what Kubernetes is and when you might want to utilize it.
- Utilize Kubernetes commands to interact with a Kubernetes environment.
Kubernetes, or “k8s” is an open source container orchestration platform to automate deploying, scaling, and managing containers within an environment. Traditionally, Kubernetes is aimed towards commercial environments and cloud infrastrutuce. Originally developed by Google and realeased open source and is now managed by the Cloud Native Computing Foundation or CNCF.
Discuss with your neighbor
What are some challenges you may have when you move from one computational environment to another for your research? For example moving from a local compute cluster to a different compute cluster.
Some examples you may have discussed:
- Different methods of interacting with the environment
- Keywords could be different or processed differently
- Software may not be available readily
Kubernetes provides a uniform platform and method across many different local and commericial cloud infrastructure. This ensures that your application is able to operate the same from your local developmment system to a production Kubernetes environment or a commercial cloud environment. This also helps ensure that a research workflow using Kubernetes is able to be reproduced on any infrastructure using Kubernetes.
Kubernetes deployments and environments are managed using YAML files. These YAML files allow you to tell the Kubernetes cluster what you want from a specific environment for the cluster then to attempt to make that defined configuration. This also increases the ability to change things in a recorded manner using version control tools like Git.
Make sure everything is ready
Before proceeding, double check that you have kubectl connected to a
Kubernetes instance. kubectl get nodes
will retreive all of
the computers or nodes running in the Kubernetes environment.
You should see output similar to this:
OUTPUT
NAME STATUS ROLES AGE VERSION
minikube Ready control-plane 90d v1.30.0
The kubectl
command does not directly control the state
of a Kubernetes cluster. Instead it communicates directly to the
Kubernetes’ controller through an API server. When you run
kubectl apply
or kubectl delete
, the command
execution doesn’t directly change the environment. Instead it tells the
controller through the API server an expected state of the environment
and the Kubernetes controller will attempt to make the declared state
the current state and keep the cluster in the desired state. This is
called “declarative management”, which simplifies the management of more
complex systems.
Talk about it
With docker or other container platforms, what challenges may come up as you start scaling up and running more workflows?
Some challenges you may have thought of:
- Needing to micromanage everything running as projects scale
- Many working components needing configured
Content from Running Pods in Kubernetes
Last updated on 2024-12-03 | Edit this page
Estimated time: 12 minutes
Overview
Questions
- What is Kubernetes and why might I want to use it?
Objectives
- Understand what Kubernetes is and when you might want to utilize it.
Pods are the fundamental building block of a Kubernetes setup and are designed to run a single portion of an application or process.
Each pod has its own identifiers and specifications and containers.
With Kubernetes, the individual container or containers in a pod are not
managed by kubectl
. Instead the pods are the lowest level
of a Kubernetes object that is managed with kubectl
for
applications.
Pods are designed around the idea of running a single portion of a larger system. Similar to each team in a company working on their own tasks. Sometimes these teams only have one person, or they may have multiple people. Pods have the ability to run with a single container or with multiple containers. If a pod is running multiple containers, they should be tightly coupled together. An example would be a pod running a simple website that is based on a Github repository. One container would be running the webserver itself and another would be pulling the repository updates.
Pods running with mutliple containers are able to also share storage volumes, which allows each of the containers to work on the same location in a straight forward manner.
Where do container images come from?
Kuberenetes pulls container images from the same sources that Docker would. This can be from the local system, DockerHub, or another remote registry service. Any images that are not already available. The syntax is also the same as what you would use for Docker.
-
hello-world
- If you use the image name only, it will pull the latest tag from DockerHub -
hello-world:1.2.3
- You can also specify a tag to select a specific version or build of an image. This should be specified when possible to help increase the reproducability of your workflows. -
registry.example.com/hello-world:1.2.3
- This would pull a specific image and tag from a remmote registry. This could be a public registry or an insitutional registry.
Exploring Pods
We are going to create a basic “hello world” pod. Below is a YAML file containing all of the information that is needed for the API controller to spawn our pod and its container.
hello-world.yaml
YAML
apiVersion: v1
kind: Pod
metadata:
name: hello-world-pod
spec:
containers:
- name: hello-world-container
image: busybox
command: ["/bin/sh", "-c"]
args: ["echo 'Hello World! I am in a pod!' && sleep infinity"]
What do the lines mean?
When we apply or send the YAML file to the api Controller, the file’s
contents tell the controller what the desired state of your
configuration is. The apiVersion
line states what version
of Kubernetes API the file is going to use. The kind
line
specifies what type of object we are creating with this block of config
The metadata
section contains basic information about the
Pod we are creating. In this case we are only specifying the name of the
pod. spec
is the specifications of what we want the desired
state of the pod to me. This file contains a basic single container
Pod’s specification. The specification is asking for one container named
hello-world-container
using the busybox
image.
The command
is what command we want the container to run,
similar to changing the entrypoint using docker run
and the
args are the commands we will be running inside the
hello-world-container
container within our
hello-world
Pod.
We can send this configuration to the API Controller using the
kubectl apply
command.
OUTPUT
pod/hello-world-pod created
We can check the status of our Pod using kubectl get
OUTPUT
NAME READY STATUS RESTARTS AGE
hello-world-pod 0/1 ContainerCreating 0 9s
The output indicates that our single pod is not ready for use and is currently creating all of the specified containers. Since we are only using one container, this process should be fairly quick.
If we check again shortly after, we can see the Pod is ready and running.
OUTPUT
NAME READY STATUS RESTARTS AGE
hello-world-pod 1/1 Running 0 5s
Getting more details
We can get more details about the Pod including the creation process
and progress today using
kubectl describe pod name-of-pod
The kubectl describe
provides a verbose description of
different Kubernetes resources. You do need to specify both the type or
kind
of object and the name of the object itself that you
want information on.
OUTPUT
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 18m default-scheduler Successfully assigned default/hello-world-pod to minikube
Normal Pulling 18m kubelet Pulling image "busybox"
Normal Pulled 18m kubelet Successfully pulled image "busybox" in 1.638s (1.638s including waiting). Image size: 4261502 bytes.
Normal Created 18m kubelet Created container hello-world-container
Normal Started 18m kubelet Started container hello-world-container
We can check if our Pod has run or is running by checking the logs
using kubectl logs name-of-pod
.
OUTPUT
Hello World! I am in a pod!
Running commands in Pods
During development and debugging of environments or verifying a quick command, it may be beneficial to get an interactive terminal within a pod or run a command within the pod.
We can pull up an interactive terminal in a pod using kubectl’s
exec
command.
OUTPUT
/ #
To get out of the interactive session, run the exit
command.
Explore the pod
Run some commands within the pod to explore its environment. Some questions that can be answered are:
- What is the hostname of the Pod?
- What user are we running as within the Pod?
Individual commands can also be executed from within the Pod without the need for an interactive session.
OUTPUT
hello-world-pod
Cleaning up
We can clean up our pods from this lesson by deleting them using
kubectl’s delete
command. To do this, we can either run the
delete
command similar to the apply
command by
specifying the file, or we can manually delete it by calling its
name.
By specifying the filename, anything that is defined in the file will get removed.
OUTPUT
pod "hello-world-pod" deleted
We can also run the same operation by specifying the resource type and resource name of the pod we are deleting.
OUTPUT
pod "hello-world-pod" deleted
When we use the file itself to delete the Pod, the definitions provided in the file automatically handle the resource name and type, while without the file, that information is needed in order for the Kubernetes controller to make an operation.
Modifying the Pod
While a pod is running, we can modify it’s definitions and the API controllers will handle the transition for us.
If we switch the container definition in the Pod yaml file from
busybox
to another container, such as
ubuntu:23.04
, we can migrate our Pod.
After making the change, we can re-apply the definition file.
OUTPUT
pod/hello-world-pod configured
The language has changed when we applied the new definition file to our Pod. When we made our Pod for the first time, it listed the Pod as “created”. In this instance the Pod was “configured” and not “created”.
We can also look at the actions that the Controller took to update
our pod using the describe
command.
OUTPUT
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 40s default-scheduler Successfully assigned default/hello-world-pod to minikube
Normal Pulling 39s kubelet Pulling image "busybox"
Normal Pulled 38s kubelet Successfully pulled image "busybox" in 718ms (718ms including waiting). Image size: 4269694 bytes.
Normal Created 37s kubelet Created container hello-world-container
Normal Started 37s kubelet Started container hello-world-container
Normal Killing 18s kubelet Container hello-world-container definition changed, will be restarted
Normal Pulling 4m53s kubelet Pulling image "ubuntu:23.04"
Normal Pulled 4m50s kubelet Successfully pulled image "ubuntu:23.04" in 3.41s (3.41s including waiting). Image size: 70323206 bytes.
Normal Created 4m45s (x2 over 5m43s) kubelet Created container hello-world-container
Normal Started 4m45s (x2 over 5m43s) kubelet Started container hello-world-container
Here we can see that the Controller noticed the definition change, then worked to kill the existing Pod to remake it with the new definition.
We can also verify that the container has changed by checking its
operating system with cat /etc/os-release
OUTPUT
PRETTY_NAME="Ubuntu 23.04"
NAME="Ubuntu"
VERSION_ID="23.04"
...
We could achieve a similar result by deleting the Pod, making the changes, then reapplying the definition, however this has us doing the effort of managing our Pods, when instead we can let Kubernetes handle it.
Clean up after the lesson
Go ahead and check for any remaining Pods and kill them before moving on.
Content from Volumes in Kubernetes
Last updated on 2024-12-03 | Edit this page
Estimated time: 12 minutes
Overview
Questions
- How do I make data available in a Pod?
Objectives
- Explore creating and using volumes
- Understand the “stateless” nature of pods
Pods are stateless entities and should be treated as such. Anything the pod creates in it’s filesystem is removed when the pod stops or dies. For temporary files and directories this is fine. However this is not condusive for research since results need to be saved somewhere. Pods and their containers are in a ephemeral or “stateless” configuration by default. This means that any data stored on the filesystems within the containers are lost when the pod is stopped. For workflows, this creates a small barrier. Workflows can still be run using this methodology, however the pods and their containers would need to be setup to download their input and upload their output to a remote server.
Volumes are the solution to this problem. The main type of volume used for pods is the Persistent Volume Claim or “PVC”. These are requests to the Kubernetes cluster to “claim” space on the storage system or a Persistent Volume (PV) of a cluster. PVCs are scoped to different namespaces, which means that a PVC is not visible or accessible by another namespace on the Kubernetes cluster.
Once a PVC is mounted in a pod, data can be stored or retrieved from using the mount path on the container’s filesystem. This allows data to stay persistent after the pod is terminated.
Creating a PV
A Kubernetes cluster may automatically create a PV when a PVC is created. This will be mentioned in any documentation of the cluster itself.
First we need to create a Persistent Volume to give us a space to claim for files.
pv_create.yaml
YAML
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv0001
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 5Gi
hostPath:
path: /mnt/pv0001/
In our pv_create.yaml
, we defined the basic details on
our Persistent Volume. We are using a generic nam, pv0001, for our
Persistent Volume, but it can be any name if desired. Since we are only
running a single pod in this part of the lesson, we are using the
ReadWriteOnce access mode, which allows multiple pods to access the
volume if they are running on the same node. There are other access
modes available that allow working across multiple nodes or setting it
to read only. More details on this are available in the Kubernetes
documentation. We also configured the Persistent Volume to have a
capcity of 5 GB of space and have it on the host filesystem at
/mnt/pv0001
.
OUTPUT
persistentvolume/pv0001 created
This will create a dedicated space for our Pods to store files. The Pods won’t have immediate access to store data in a PV. In order for the Pods to store data, they need to use a claim against the PV using a Persistent Volume Claim.
Creating a PVC
For our Pods to store data, they need to claim space in a Persistent Volume.
pvc_storage.yaml
YAML
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-test-pv-claim
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
This configuration will look for a Persistent Volume to claim against if there is a mathcing and available PV.
If there is an available and matching PV, a claim will then be available for a Pod or multiple Pods to use.
OUTPUT
persistentvolumeclaim/my-test-pv-claim created
This created a PVC and dynamically made a PV since a PV did not yet exist.
We can check both by using kubectl get pv
and
kubectl get pvc
.
OUTPUT
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE
pvc-7c06f9ef-909a-4cd7-b450-d136219a8964 8Gi RWO Delete Bound openproject/data-my-openproject-postgresql-0 standard <unset> 83d
OUTPUT
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
my-test-pv-claim Pending manual <unset> 82s
Transfering data to and from a PVC
In order to transfer data to and from a PVC, you need to have the volume mounted in a pod for the transfer. Data from a local computer can then be copied to a PVC through the Pod using the mount path of the PVC.
First we will need to make a pod to copy our data through. We will use a similar structure as our first pod.
YAML
apiVersion: v1
kind: Pod
metadata:
name: hello-world-pod
spec:
containers:
- name: hello-world-container
image: busybox
command: ["/bin/sh", "-c"]
args: ["echo 'Hello World! I am in a pod!' && sleep infinity"]
volumeMounts:
- mountPath: /mnt/my_pvc
name: my-pvc-for-pod
volumes:
- name: my-pvc-for-pod
persistentVolumeClaim:
claimName: my-test-pv-claim
We can confirm if our mountPath
has our PVC available by
running ls
in the pod.
OUTPUT
total 4
drwxr-xr-x 2 root root 4096 Oct 7 16:27 my_pvc
Since our Pod is running the sleep
command indefinitely,
the pod will remain active and we can use it to transfer some data.
We will create a basic file to place in the PVC. In
my_file.md
we will insert the following text.
Hello, I am a file in a PVC!
OUTPUT
Hello, I am a file in a PVC!
If we check the content of the PVC mounted in the Pod, we can see our file and look at it’s contents.
OUTPUT
total 4
-rw-rw-r-- 1 1000 1000 29 Oct 7 16:49 my_file.md
OUTPUT
Hello, I am a file in a PVC!
Verifying the data is persistent
We can verify that the data is persistent by deleting the pod and creating a new pod.
OUTPUT
pod "hello-world-pod" deleted
At this point any data in the Pod’s filesystem itself would be gone as the Pod’s filesystem is ephemeral.
We will create a new Pod that will output the contents of the file.
check_pvc.yaml
YAML
apiVersion: v1
kind: Pod
metadata:
name: check-pvc-pod
spec:
containers:
- name: file-check-container
image: busybox
command: ["/bin/sh", "-c"]
args: ["cat /mnt/my_pvc/my_file.md && sleep infinity"]
volumeMounts:
- mountPath: /mnt/my_pvc
name: my-pvc-for-pod
volumes:
- name: my-pvc-for-pod
persistentVolumeClaim:
claimName: my-test-pv-claim
OUTPUT
pod/check-pvc-pod created
We can then check the logs of the pod to see the contents of the file.
OUTPUT
Hello, I am a file in a PVC!
We can also modify create a file in the volume using a Pod.
pod_create_file.yaml
YAML
apiVersion: v1
kind: Pod
metadata:
name: file-create-pod
spec:
containers:
- name: file-create-container
image: busybox
command: ["/bin/sh", "-c"]
args: ["for i in 1 2 3 4 5; do cat /mnt/my_pvc/my_file.md >> /mnt/my_pvc/output.log; done; ls /mnt/my_pvc; cat /mnt/my_pvc/output.log && sleep infinity"]
volumeMounts:
- mountPath: /mnt/my_pvc
name: my-pvc-for-pod
volumes:
- name: my-pvc-for-pod
persistentVolumeClaim:
claimName: my-test-pv-claim
What we should expect to see is the contents of our PVC mounted in
the pod and the contents of output.log
OUTPUT
my_file.md
output.log
Hello, I am a file in a PVC!
Hello, I am a file in a PVC!
Hello, I am a file in a PVC!
Hello, I am a file in a PVC!
Hello, I am a file in a PVC!
We can then copy the output.log
file back to our
computer.
OUTPUT
Hello, I am a file in a PVC!
Hello, I am a file in a PVC!
Hello, I am a file in a PVC!
Hello, I am a file in a PVC!
Hello, I am a file in a PVC!
Content from Jobs in Kubernetes
Last updated on 2025-01-07 | Edit this page
Estimated time: 12 minutes
Overview
Questions
- What are Jobs and how can I use it for my workflows?
Objectives
- Understand what a Job is.
- Explore the usage of Jobs to run computational workflows.
NRP Example: https://docs.nationalresearchplatform.org/userdocs/running/jobs
What are Jobs?
Kubernetes Jobs are used to run tasks to completion, such as the end of a specific step in workflow or the complete end of a workflow. In the YAML file, Jobs and Pods appear very similar, but have a few extra pieces. This is a result of Jobs being a high-level abstration that manages a Pod to make sure they run to completition, retry a set amount of times until they comlpete succesfully or run out of attempts. Jobs can also be set to run multiple Pods of the same Pod at the same time. Jobs are also good for one-off or scheduled tasks such as data updates or backups and can be set to run on a schedule as a Cronjob.
This makes pods very advantageous for computational scientific workflows where each step may only need to run once for each step, or can split out and run the same part across multiple files. Jobs also enable easier monitoring of the current workflow stage since you only need to specify the Job rather than multiple pods.
The file structure for a Job is fairly similar to a Pod’s yaml file.
pi-example.yaml
YAML
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
template:
spec:
containers:
- name: pi
image: perl:5.34.0
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
backoffLimit: 4
Example from https://kubernetes.io/docs/concepts/workloads/controllers/job/
The example above will run a perl container that compute Pi to 2000 digits. The backoff Limit also prevents a Job from running continously in a crash loop if any of the Pods spawned by the Job fail. In the example above, if more than 4 Pods fail, it will cancel the Job.
The example may take up to 2 minutes to complete.
OUTPUT
job.batch/pi created
After 2 minutes, check the logs.
OUTPUT
3.141592653589793238462643383279502...
What are their benefits for research computing?
By using a Job, you are able to run tasks and workflows in a manner similar to traditional batch computing clusters. Unlike a Pod, a Job will run until a task completes up to a certain number of re-runs. A Pod does not have this function and would continously restart or follow the global restart policy. On a small local setup, this may not be a large challenge, but becomes critical when using a campus or regional Kubernetes cluster. They also allow more flexibility in parralellization through the ability to have one Job spawn many Pods of the same type with different names.
Content from Volumes in Kubernetes
Last updated on 2025-01-07 | Edit this page
Estimated time: 12 minutes
Overview
Questions
- What are Kubernetes Services?
- What can Kubernetes Services be used for?
Objectives
- Understand what Services in Kubernetes are.
Services are a way to expose a deployment or pod to the network outside of a kubernetes cluster for a long term. When we create a Pod, it gets assigned a unique IP address that allows it to communicate with other pods. However, when a pod is restarted or deleted and recreated, it’s IP address will change, which makes communication amongst pods more complicated.
Services allow Pods to have more reliable communication by abstracting the networking layer between Pods and other Pods or the outside world. Examples of this could be having a internal database communicating with an application in another Pod doing analysis on the stored data. It could also be used to expose a science gateway to the public internet or a local campus network.