kubernetes pods

Concept

  • a pod is a group of containers
  • pods are the base of a kubernetes deployment
    • it’s the smallest unit of deployment we can create / deploy
  • a pod allows its containers to share resources in order to collaborate easily

Shared resources between containers in a pod

Containers in a pod:

  • share the same IPC, thus can communicate using semaphores SystemV or using shared memory POSIX
  • have the same IP address, the same port range and can find themselves with localhost host name
  • share the same volume
  • are always co-located and co-managed

Multiple pods vs multiple containers in a pod

Most of the time, separate containers in distinct pods, unless there is a specific reason not to do it, e.g.:

  • hot reload on event trigger
  • routing service
  • web container (Nginx, apache)
  • log aggregation
  • telemitry

Those containers are often called sidecar containers.

Configuration

File descriptor in YAML or JSON format:

kubectl run whoami \
  --image containous/whoami:latest \
  --port 80 \
  --dry-run client \
  --output yaml

DSN_LABEL

Constraint to use the RFC1123:

title: a DNS-1123 label must consist of lower case alphanumeric characters or '-' and must start and end with an alphanumeric character (e.g. `my-name`, `123-abc`). The regexp used for validation is `[a-z0-9]([-a-z0-9]*[a-z0-9])?)`

Image pull policy

imagePullPolicy specifies kubernetes the policy to fetch the images

By default:

  • Always: if the docker image is set to latest
  • IfNotPresent: if the tag is not latest
title: It's recommended to explicitely set the `imagePullPolicy`

Pod deletion

A pod deletion will delete its containers in the following manner:

  1. send SIGTERM (15) signal
  2. after a grace period (30s by default), send SIGKILL (9) if the container is still not terminated

You can set the terminationGracePeriodSeconds by container to defined an adequate duration.

You can use the option --grace-period to override the grace period.

You can use the option --now to set the period to 1, i.e. sending directly the SIGKILL signal.

Security Context

To launch a container with a different group or user, we can use the securityContext:

apiVersion: v1
kind: Pod
metadata:
  name: security-context-demo
spec:
  # set on the whole pod
  securityContext:
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
  containers:
    - name: sec-ctx-demo
      image: busybox
      command: [ "id" ]
    - name: root
      image: busybox
      command: [ "id" ]
      # set on specific container
      securityContext:
        runAsUser: 0

Init containers

InitContainers are like containers but:

  • have limited lifetime
  • each InitContainers are launched only if the previous finished without error

When should we use an InitContainer?

  • the InitContainers share the same volume as the containers, so we can generate the configuration
  • to perform database migration/updates
  • they are launched before the containers, so they can also be used to wait for the pod external dependencies to be ready
apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
  labels:
    app: myapp
spec:
  initContainers:
  - name: init-myservice
    image: busybox
    command: [ 'sh' , '-c' , 'until nslookup myservice; do echo waiting for myservice; sleep 2; done;' ]
  - name: init-mydb
    image: busybox
    command: [ 'sh' , '-c' , 'until nslookup mydb; do echo waiting for mydb; sleep 2; done;']
  containers:
  - name: myapp-container
    image: debian:10-slim
    command: [ 'bash' , '-c' , 'echo The app is running! && sleep 3600']

Container probe

How to detect when an application doesn’t work when the container doesn’t die? Enter container probes.

3 types of probes:

  • httpGet: GET HTTP request and check response code is >=200 and < 400
  • tcpSocket: check if the port is open
  • exec: execute a command in the container, if the return code is 0, it’s a success

There are 3 categories of probes:

  • Startup == is start up
  • Liveness == is alive
  • Readiness == is ready to receive traffic

Warning

Only one probe type can be defined for each category (Startup, Liveness or Readiness) per container.

Note

A probe should not be affected by your application dependencies! It should be the application responsibility to handle dependency errors (e.g. circuit breaker). A probe should be lightweight as to not impact the performance of the application.

Limits

---
apiVersion: v1
kind: Pod
metadata:
  name: frontend
spec:
  containers:
    - name: app
      image: super.mycompany.com/app:v4
      # sum of all its container's resources
      resources:
        # min resources needed
        requests:
          memory: "256Mi"
          cpu: "500m"
        # max resources
        limits:
          memory: "2Gi"
          cpu: "2"
  • if a container allocates more memory than what’s configured in the limit, it will be subject to be terminated
  • if the container can be restarted, the kubelet will try to restart it

CPU resource management

  • cpu unit
    • 1 AWS vCPU
    • 1 GCP Core
    • 1 Azure vCore
    • 1 Hyperthread
  • m suffix == milli
    • e.g. 100m cpu == 0.1 cpu
  • the value of a single CPU is absolute, i.e. 0.1 is the same CPU value whether it’s a single-core, dual-core or 48-core