Skip to content

Latest commit

 

History

History
887 lines (632 loc) · 38.2 KB

File metadata and controls

887 lines (632 loc) · 38.2 KB

Containers

Containers Exercises

Name Topic Objective & Instructions Solution Comments
Running Containers Intro Exercise Solution
Working with Images Image Exercise Solution
My First Dockerfile Dockerfile Exercise
Run, Forest, Run! Restart Policies Exercise Solution
Layer by Layer Image Layers Exercise Solution
Containerize an application Containerization Exercise Solution
Multi-Stage Builds Multi-Stage Builds Exercise Solution

Containers Self Assessment

What is a Container?

This can be tricky to answer since there are many ways to create a containers:

  • Docker
  • systemd-nspawn
  • LXC

If to focus on OCI (Open Container Initiative) based containers, it offers the following definition: "An environment for executing processes with configurable isolation and resource limitations. For example, namespaces, resource limits, and mounts are all part of the container environment."

Why containers are needed? What is their goal?

OCI provides a good explanation: "Define a unit of software delivery called a Standard Container. The goal of a Standard Container is to encapsulate a software component and all its dependencies in a format that is self-describing and portable, so that any compliant runtime can run it without extra dependencies, regardless of the underlying machine and the contents of the container."

How are containers different from virtual machines (VMs)?

The primary difference between containers and VMs is that containers allow you to virtualize multiple workloads on a single operating system while in the case of VMs, the hardware is being virtualized to run multiple machines each with its own guest OS. You can also think about it as containers are for OS-level virtualization while VMs are for hardware virtualization.

  • Containers don't require an entire guest operating system as VMs. Containers share the system's kernel as opposed to VMs. They isolate themselves via the use of kernel's features such as namespaces and cgroups
  • It usually takes a few seconds to set up a container as opposed to VMs which can take minutes or at least more time than containers as there is an entire OS to boot and initialize as opposed to containers which has share of the underlying OS
  • Virtual machines considered to be more secured than containers
  • VMs portability considered to be limited when compared to containers
Do we need virtual machines in the edge of containers? Are they still relevant?
In which scenarios would you use containers and in which you would prefer to use VMs?

You should choose VMs when:

  • You need run an application which requires all the resources and functionalities of an OS
  • You need full isolation and security

You should choose containers when:

  • You need a lightweight solution
  • Running multiple versions or instances of a single application
Describe the process of containerizing an application
  1. Write a Dockerfile that includes your app (including the commands to run it) and its dependencies
  2. Build the image using the Dockefile you wrote
  3. You might want to push the image to a registry
  4. Run the container using the image you've built

Containers - OCI

What is the OCI?

OCI (Open Container Initiative) is an open governance established in 2015 to standardize container creation - mostly image format and runtime. At that time there were a number of parties involved and the most prominent one was Docker.

Specifications published by OCI:

Which operations OCI based containers must support?

Create, Kill, Delete, Start and Query State.

Containers - Basic Commands

How to list all the containers on a given host?

In the case of Docker, use: docker container ls
In the case of Podman, it's not very different: podman container ls

How to run a container?

Docker: docker container run ubuntu
Podman: podman container run ubuntu

Why after running podman container run ubuntu the output of podman container ls is empty?

Because the container immediately exits after running the ubuntu image. This is completely normal and expected as containers designed to run a service or a app and exit when they are done running it.

If you want the container to keep running, you can run a command like sleep 100 which will run for 100 seconds or you can attach to terminal of the container with a command similar: podman container run -it ubuntu /bin/bash

How to attach your shell to a terminal of a running container?

podman container exec -it [container id/name] bash

This can be done in advance while running the container: podman container run -it [image:tag] /bin/bash

True or False? You can remove a running container if it doesn't running anything

False. You have to stop the container before removing it.

How to stop and remove a container?

podman container stop <container id/name> && podman container rm <container id/name>

What happens when you run docker container run ubuntu?
  1. Docker client posts the command to the API server running as part of the Docker daemon
  2. Docker daemon checks if a local image exists
  3. If it exists, it will use it
  4. If doesn't exists, it will go to the remote registry (Docker Hub by default) and pull the image locally
  5. containerd and runc are instructed (by the daemon) to create and start the container
How to run a container in the background?

With the -d flag. It will run in the background and will not attach it to the terminal.

docker container run -d httpd or podman container run -d httpd

Containers - Images

What is a container image?
  • An image of a container contains the application, its dependencies and the operating system where the application is executed.
  • It's a collection of read-only layers. These layers are loosely coupled
    • Each layer is assembled out of one or more files
Why container images are relatively small?
  • Most of the images don't contain Kernel. They share and access the one used by the host on which they are running
  • Containers intended to run specific application in most cases. This means they hold only what the application needs in order to run
How to list the container images on certain host?

podman image ls
docker image ls

Depends on which containers engine you use.

How the centralized location, where images are stored, is called?

Registry

A registry contains one or more ____ which in turn contain one or more ____

A registry contains one or more repositories which in turn contain one or more images.

How to find out which registry do you use by default from your environment?

Depends on the containers technology you are using. For example, in case of Docker, it can be done with docker info

> docker info
Registry: https://index.docker.io/v1

How to retrieve the latest ubuntu image?

docker image pull ubuntu:latest

True or False? It's not possible to remove an image if a certain container is using it

True. You should stop and remove the container before trying to remove the image it uses.

True or False? If a tag isn't specified when pulling an image, the 'latest' tag is being used

True

True or False? Using the 'latest' tag when pulling an image means, you are pulling the most recently published image

False. While this might be true in some cases, it's not guaranteed that you'll pull the latest published image when using the 'latest' tag.
For example, in some images, 'edge' tag is used for the most recently published images.

Where pulled images are stored?

Depends on the container technology being used. For example, in case of Docker, images are stored in /var/lib/docker/

Explain container image layers
  • The layers of an image is where all the content is stored - code, files, etc.
  • Each layer is independent
  • Each layer has an ID that is an hash based on its content
  • The layers (as the image) are immutable which means a change to one of the layers can be easily identified
True or False? Changing the content of any of the image layers will cause the hash content of the image to change

True. These hashes are content based and since images (and their layers) are immutable, any change will cause the hashes to change.

How to list the layers of an image?

In case of Docker, you can use docker image inspect <name>

True or False? In most cases, container images contain their own kernel

False. They share and access the one used by the host on which they are running.

True or False? A single container image can have multiple tags

True. When listing images, you might be able to see two images with the same ID but different tags.

What is a dangling image?

It's an image without tags attached to it. One way to reach this situation is by building an image with exact same name and tag as another already existing image. It can be still referenced by using its full SHA.

How to see changes done to a given image over time?

In the case of Docker, you could use docker history <name>

True or False? Multiple images can share layers

True.
One evidence for that can be found in pulling images. Sometimes when you pull an image, you'll see a line similar to the following:
fa20momervif17: already exists

This is because it recognizes such layer already exists on the host, so there is no need to pull the same layer twice.

What is the digest of an image? What problem does it solves?

Tags are mutable. This is mean that we can have two different images with the same name and the same tag. It can be very confusing to see two images with the same name and the same tag in your environment. How would you know if they are truly the same or are they different?

This is where "digests` come handy. A digest is a content-addressable identifier. It isn't mutable as tags. Its value is predictable and this is how you can tell if two images are the same content wise and not merely by looking at the name and the tag of the images.

True or False? A single image can support multiple architectures (Linux x64, Windows x64, ...)

True.

What is a distribution hash in regards to layers?
  • Layers are compressed when pushed or pulled
  • distribution hash is the hash of the compressed layer
  • the distribution hash used when pulling or pushing images for verification (making sure no one tempered with image or layers)
  • It's also used for avoiding ID collisions (a case where two images have exactly the same generated ID)
How multi-architecture images work? Explain by describing what happens when an image is pulled
  1. A client makes a call to the registry to use a specific image (using an image name and optionally a tag)
  2. A manifest list is parsed (assuming it exists) to check if the architecture of the client is supported and available as a manifest
  3. If it is supported (a manifest for the architecture is available) the relevant manifest is parsed to obtain the IDs of the layers
  4. Each layer is then pulled using the obtained IDs from the previous step
How to check which architectures a certain container image supports?

docker manifest inspect <name>

How to check what a certain container image will execute once we'll run a container based on that image?

Look for "Cmd" or "Entrypoint" fields in the output of docker image inspec <image name>

How to view the instructions that were used to build image?

docker image history <image name>:<tag>

How docker image build works?
  1. Docker spins up a temporary container
  2. Runs a single instruction in the temporary container
  3. Stores the result as a new image layer
  4. Remove the temporary container
  5. Repeat for every instruction
What is the role of cache in image builds?

When you build an image for the first time, the different layers are being cached. So, while the first build of the image might take time, any other build of the same image (given that Dockerfile didn't change or the content used by the instructions) will be instant thanks to the caching mechanism used.

In little bit more details, it works this way:

  1. The first instruction (FROM) will check if base image already exists on the host before pulling it
  2. For the next instruction, it will check in the build cache if an existing layer was built from the same base image + if it used the same instruction
  3. If it finds such layer, it skips the instruction and links the existing layer and it keeps using the cache.
  4. If it doesn't find a matching layer, it builds the layer and the cache is invalidated.

Note: in some cases (like COPY and ADD instructions) the instruction might stay the same but if the content of what being copied is changed then the cache is invalidated. The way this check is done is by comparing the checksum of each file that is being copied.

What ways are there to reduce container images size?
  • Reduce number of instructions - in some case you may be able to join layers by installing multiple packages with one instructions for example or using && to concatenate RUN instructions
  • Using smaller images - in some cases you might be using images that contain more than what is needed for your application to run. It is good to get overview of some images and see whether you can use smaller images that you are usually using.
  • Cleanup after running commands - some commands, like packages installation, create some metadata or cache that you might not need for running the application. It's important to clean up after such commands to reduce the image size
  • For Docker images, you can use multi-stage builds
What are the pros and cons of squashing images?

Pros:

  • Smaller image
  • Reducing number of layers (especially if the image has lot of layers) Cons:
  • No sharing of the image layers
  • Push and pull can take more time (because no matching layers found on target)

Containers - Volume

How to create a new volume?

docker volume create some_volume

Containers - Dockerfile

What is a Dockerfile?

Different container engines (e.g. Docker, Podman) can build images automatically by reading the instructions from a Dockerfile. A Dockerfile is a text file that contains all the instructions for building an image which containers can use.

What is the instruction in all Dockefiles and what does it mean?

The first instruction is FROM <image name>
It specifies the base layer of the image to be used. Every other instruction is a layer on top of that base image.

List five different instructions that are available for use in a Dockerfile
  • WORKDIR: sets the working directory inside the image filesystems for all the instructions following it
  • EXPOSE: exposes the specified port (it doesn't adds a new layer, rather documented as image metadata)
  • ENTRYPOINT: specifies the startup commands to run when a container is started from the image
  • ENV: sets an environment variable to the given value
  • USER: sets the user (and optionally the user group) to use while running the image
What are some of the best practices regarding container images and Dockerfiles that you are following?
  • Include only the packages you are going to use. Nothing else.
  • Specify a tag in FROM instruction. Not using a tag means you'll always pull the latest, which changes over time and might result in unexpected result.
  • Do not use environment variables to share secrets
  • Use images from official repositories
  • Keep images small! - you want them only to include what is required for the application to run successfully. Nothing else.
  • If are using the apt package manager, you might use 'no-install-recommends' with apt-get install to install only main dependencies (instead of suggested, recommended packages)
What is the "build context"?

Docker docs: "A build’s context is the set of files located in the specified PATH or URL"

What is the difference between ADD and COPY in Dockerfile?

COPY takes in a source and destination. It lets you copy in a file or directory from the build context into the Docker image itself.
ADD lets you do the same, but it also supports two other sources. You can use a URL instead of a file or directory from the build context. In addition, you can extract a tar file from the source directly into the destination.

Although ADD and COPY are functionally similar, generally speaking, COPY is preferred. That’s because it’s more transparent than ADD. COPY only supports the basic copying of files from build context into the container, while ADD has some features (like local-only tar extraction and remote URL support) that are not immediately obvious.

What is the difference between CMD and RUN in Dockerfile?

RUN lets you execute commands inside of your Docker image. These commands get executed once at build time and get written into your Docker image as a new layer. CMD is the command the container executes by default when you launch the built image. A Dockerfile can only have one CMD. You could say that CMD is a Docker run-time operation, meaning it’s not something that gets executed at build time. It happens when you run an image. A running image is called a container.

How to create a new image using a Dockerfile?

The following command is executed from within the directory where Dockefile resides:

docker image build -t some_app:latest . podman image build -t some_app:latest .

Do you perform any checks or testing on your Dockerfiles?

One option is to use hadolint project which is a linter based on Dockerfile best practices.

Which instructions in Dockerfile create new layers?

Instructions such as FROM, COPY and RUN, create new image layers instead of just adding metadata.

Which instructions in Dockerfile create image metadata and don't create new layers?

Instructions such as ENTRYPOINT, ENV, EXPOSE, create image metadata and they don't create new layers.

Is it possible to identify which instruction create a new layer from the output of docker image history?

Containers - Architecture

How container achieve isolation from the rest of the system?

Through the use of namespaces and cgroups. Linux kernel has several types of namespaces:

  • Process ID namespaces: these namespaces include independent set of process IDs
  • Mount namespaces: Isolation and control of mountpoints
  • Network namespaces: Isolates system networking resources such as routing table, interfaces, ARP table, etc.
  • UTS namespaces: Isolate host and domains
  • IPC namespaces: Isolates interprocess communications
  • User namespaces: Isolate user and group IDs
  • Time namespaces: Isolates time machine
Describe in detail what happens when you run `podman/docker run hello-world`?

Docker/Podman CLI passes your request to Docker daemon. Docker/Podman daemon downloads the image from Docker Hub Docker/Podman daemon creates a new container by using the image it downloaded Docker/Podman daemon redirects output from container to Docker CLI which redirects it to the standard output

Describe difference between cgroups and namespaces
cgroup: Control Groups provide a mechanism for aggregating/partitioning sets of tasks, and all their future children, into hierarchical groups with specialized behavior. namespace: wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource.

In short:

Cgroups = limits how much you can use; namespaces = limits what you can see (and therefore use)

Cgroups involve resource metering and limiting: memory CPU block I/O network

Namespaces provide processes with their own view of the system

Multiple namespaces: pid,net, mnt, uts, ipc, user

Containers - Docker Architecture

Which components/layers compose the Docker technology?
  1. Runtime - responsible for starting and stopping containers
  2. Daemon - implements the Docker API and takes care of managing images (including builds), authentication, security, networking, etc.
  3. Orchestrator
What components are part of the Docker engine?
  • Docker daemon
  • containerd
  • runc
What is the low-level runtime?
  • The low level runtime is called runc
  • It manages every container running on Docker host
  • Its purpose is to interact with the underlying OS to start and stop containers
  • Its reference implementation is of the OCI (Open Containers Initiative) container-runtime-spec
  • It's a small CLI wrapper for libcontainer
What is the high-level runtime?
  • The high level runtime is called containerd
  • It was developed by Docker Inc and at some point donated to CNCF
  • It manages the whole lifecycle of a container - start, stop, remove and pause
  • It take care of setting up network interfaces, volume, pushing and pulling images, ...
  • It manages the lower level runtime (runc) instances
  • It's used both by Docker and Kubernetes as a container runtime
  • It sits between Docker daemon and runc at the OCI layer

Note: running ps -ef | grep -i containerd on a system with Docker installed and running, you should see a process of containerd

True or False? The docker daemon (dockerd) performs lower-level tasks compared to containerd

False. The Docker daemon performs higher-level tasks compared to containerd.
It's responsible for managing networks, volumes, images, ...

Describe in detail what happens when you run `docker pull image:tag`?
Docker CLI passes your request to Docker daemon. Dockerd Logs shows the process

docker.io/library/busybox:latest resolved to a manifestList object with 9 entries; looking for a unknown/amd64 match

found match for linux/amd64 with media type application/vnd.docker.distribution.manifest.v2+json, digest sha256:400ee2ed939df769d4681023810d2e4fb9479b8401d97003c710d0e20f7c49c6

pulling blob "sha256:61c5ed1cbdf8e801f3b73d906c61261ad916b2532d6756e7c4fbcacb975299fb Downloaded 61c5ed1cbdf8 to tempfile /var/lib/docker/tmp/GetImageBlob909736690

Applying tar in /var/lib/docker/overlay2/507df36fe373108f19df4b22a07d10de7800f33c9613acb139827ba2645444f7/diff" storage-driver=overlay2

Applied tar sha256:514c3a3e64d4ebf15f482c9e8909d130bcd53bcc452f0225b0a04744de7b8c43 to 507df36fe373108f19df4b22a07d10de7800f33c9613acb139827ba2645444f7, size: 1223534

Describe in detail what happens when you run a container
  1. The Docker client converts the run command into an API payload
  2. It then POST the payload to the API endpoint exposed by the Docker daemon
  3. When the daemon receives the command to create a new container, it makes a call to containerd via gRPC
  4. containerd converts the required image into an OCI bundle and tells runc to use that bundle for creating the container
  5. runc interfaces with the OS kernel to pull together the different constructs (namespace, cgroups, etc.) used for creating the container
  6. Container process is started as a child-process of runc
  7. Once it starts, runc exists
True or False? Killing the Docker daemon will kill all the running containers

False. While this was true at some point, today the container runtime isn't part of the daemon (it's part of containerd and runc) so stopping or killing the daemon will not affect running containers.

True or False? containerd forks a new instance runc for every container it creates

True

True or False? Running a dozen of containers will result in having a dozen of runc processes

False. Once a container is created, the parent runc process exists.

What is shim in regards to Docker?

shim is the process that becomes the container's parent when runc process exists. It's responsible for:

  • Reporting exit code back to the Docker daemon
  • Making sure the container doesn't terminate if the daemon is being restarted. It does so by keeping the stdout and stdin open
What `podman commit` does?. When will you use it?

Create a new image from a container’s changes

How would you transfer data from one container into another?
What happens to data of the container when a container exists?
Explain what each of the following commands do:
  • docker run
  • docker rm
  • docker ps
  • docker pull
  • docker build
  • docker commit

How do you remove old, non running, containers?
  1. To remove one or more Docker images use the docker container rm command followed by the ID of the containers you want to remove.
  2. The docker system prune command will remove all stopped containers, all dangling images, and all unused networks
  3. docker rm $(docker ps -a -q) - This command will delete all stopped containers. The command docker ps -a -q will return all existing container IDs and pass them to the rm command which will delete them. Any running containers will not be deleted.
How the Docker client communicates with the daemon?

Via the local socket at /var/run/docker.sock

Explain Docker interlock
What is Docker Repository?
Explain image layers

A Docker image is built up from a series of layers. Each layer represents an instruction in the image’s Dockerfile. Each layer except the very last one is read-only. Each layer is only a set of differences from the layer before it. The layers are stacked on top of each other. When you create a new container, you add a new writable layer on top of the underlying layers. This layer is often called the “container layer”. All changes made to the running container, such as writing new files, modifying existing files, and deleting files, are written to this thin writable container layer. The major difference between a container and an image is the top writable layer. All writes to the container that add new or modify existing data are stored in this writable layer. When the container is deleted, the writable layer is also deleted. The underlying image remains unchanged. Because each container has its own writable container layer, and all changes are stored in this container layer, multiple containers can share access to the same underlying image and yet have their own data state.

What best practices are you familiar related to working with containers?
How do you manage persistent storage in Docker?
How can you connect from the inside of your container to the localhost of your host, where the container runs?
How do you copy files from Docker container to the host and vice versa?

Containers - Docker Compose

Explain what is Docker compose and what is it used for

Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services. Then, with a single command, you create and start all the services from your configuration.

For example, you can use it to set up ELK stack where the services are: elasticsearch, logstash and kibana. Each running in its own container.
In general, it's useful for running applications which composed out of several different services. It let's you manage it as one deployed app, instead of different multiple separate services.

Describe the process of using Docker Compose

  • Define the services you would like to run together in a docker-compose.yml file
  • Run docker-compose up to run the services

Containers - Docker Images

What is Docker Hub?

One of the most common registries for retrieving images.

How to push an image to Docker Hub?

docker image push [username]/[image name]:[tag]

For example:

docker image mario/web_app:latest

What is the difference between Docker Hub and Docker cloud?

Docker Hub is a native Docker registry service which allows you to run pull and push commands to install and deploy Docker images from the Docker Hub.

Docker Cloud is built on top of the Docker Hub so Docker Cloud provides you with more options/features compared to Docker Hub. One example is Swarm management which means you can create new swarms in Docker Cloud.

Explain Multi-stage builds

Multi-stages builds allow you to produce smaller container images by splitting the build process into multiple stages.

As an example, imagine you have one Dockerfile where you first build the application and then run it. The whole build process of the application might be using packages and libraries you don't really need for running the application later. Moreover, the build process might produce different artifacts which not all are needed for running the application.

How do you deal with that? Sure, one option is to add more instructions to remove all the unnecessary stuff but, there are a couple of issues with this approach:

  1. You need to know what to remove exactly and that might be not as straightforward as you think
  2. You add new layers which are not really needed

A better solution might be to use multi-stage builds where one stage (the build process) is passing the relevant artifacts/outputs to the stage that runs the application.

True or False? In multi-stage builds, artifacts can be copied between stages

True. This allows us to eventually produce smaller images.

What .dockerignore is used for?

By default, Docker uses everything (all the files and directories) in the directory you use as build context.
.dockerignore used for excluding files and directories from the build context

Containers - Networking

What container network standards or architectures are you familiar with?

CNM (Container Network Model):

  • Requires distrubited key value store (like etcd for example) for storing the network configuration
  • Used by Docker CNI (Container Network Interface):
  • Network configuration should be in JSON format

Containers - Docker Networking

What network specification Docker is using and how its implementation is called?

Docker is using the CNM (Container Network Model) design specification.
The implementation of CNM specification by Docker is called "libnetwork". It's written in Go.

Explain the following blocks in regards to CNM:
  • Networks

  • Endpoints

  • Sandboxes


  • Networks: software implementation of an switch. They used for grouping and isolating a collection of endpoints.

  • Endpoints: Virtual network interfaces. Used for making connections.

  • Sandboxes: Isolated network stack (interfaces, routing tables, ports, ...)

  • True or False? If you would like to connect a container to multiple networks, you need multiple endpoints

    True. An endpoint can connect only to a single network.

    What are some features of libnetwork?
    • Native service discovery
    • ingress-based load balancer
    • network control plane and management plane

    Containers - Security

    What security best practices are there regarding containers?
    • Install only the necessary packages in the container
    • Don't run containers as root when possible
    • Don't mount the Docker daemon unix socket into any of the containers
    • Set volumes and container's filesystem to read only
    • DO NOT run containers with --privilged flag
    A container can cause a kernel panic and bring down the whole host. What preventive actions can you apply to avoid this specific situation?
    • Install only the necessary packages in the container
    • Set volumes and container's filesystem to read only
    • DO NOT run containers with --privilged flag

    Containers - Docker in Production

    What are some best practices you following in regards to using containers in production?

    Images:

    • Use images from official repositories
    • Include only the packages you are going to use. Nothing else.
    • Specify a tag in FROM instruction. Not using a tag means you'll always pull the latest, which changes over time and might result in unexpected result.
    • Do not use environment variables to share secrets
    • Keep images small! - you want them only to include what is required for the application to run successfully. Nothing else. Components:
    • Secured connection between components (e.g. client and server)
    True or False? It's recommended for production environments that Docker client and server will communicate over network using HTTP socket

    False. Communication between client and server shouldn't be done over HTTP since it's insecure. It's better to enforce the daemon to only accept network connection that are secured with TLS.
    Basically, the Docker daemon will only accept secured connections with certificates from trusted CA.

    What forms of self-healing options available for Docker containers?

    Restart Policies. It allows you to automatically restart containers after certain events.

    What restart policies are you familiar with?
    • always: restart the container when it's stopped (not with docker container stop)
    • unless-stopped: restart the container unless it was in stopped status
    • no: don't restart the container at any point (default policy)
    • on-failure: restart the container when it exists due to an error (= exit code different than zero)

    Containers - Docker Misc

    Explain what is Docker Bench