You must understand these 17 Docker best practices!

Posted by Shawazi on Tue, 18 Jan 2022 21:13:05 +0100

This article shares some best practices to follow when writing Dockerfiles and using Docker. The space is long. It is recommended to collect it first and read it slowly to ensure that it will be very rewarding after reading it.

Article catalog

Dockerfile best practices

Use multi-stage build
Adjust the order of Dockerfile commands
Using a small Docker base image
Minimize the number of layers
Use unprivileged containers
COPY is preferred over ADD
Cache Python packages on Docker host
Each container runs only one process
Array syntax is preferred over String Syntax
Understand the difference between ENTRYPOINT and CMD
Add health check HEALTHCHECK

Docker mirroring best practices

Version of Docker image
Do not store keys in images
use. Docker ignore file
Check and scan your Docker files and images
Sign and verify images
Set memory and CPU limits

Dockerfile best practices

1. Use multi-stage construction

Take advantage of multi-stage construction to create a leaner and more secure Docker image. Multi stage builds (1) allows you to divide your Dockerfile into several stages.

For example, you can have a stage for compiling and building your application, which can then be copied to subsequent stages. Since only the last stage is used to create the image, the dependencies and tools related to building the application are discarded, leaving a lean, modular, production ready image.

Web development example:

# Interim phase
FROM python:3.9-slim as builder

WORKDIR /app

RUN apt-get update && \
    apt-get install -y --no-install-recommends gcc

COPY requirements.txt .
RUN pip wheel --no-cache-dir --no-deps --wheel-dir /app/wheels -r requirements.txt


# Final stage
FROM python:3.9-slim

WORKDIR /app

COPY --from=builder /app/wheels /wheels
COPY --from=builder /app/requirements.txt .

RUN pip install --no-cache /wheels/*

In this example, the GCC compiler is required to install some Python packages, so we added a temporary, build time phase to handle the build phase.

Since the final runtime image does not contain GCC, it is lighter and more secure. Mirror size comparison:

REPOSITORY                 TAG                    IMAGE ID       CREATED          SIZE
docker-single              latest                 8d6b6a4d7fb6   16 seconds ago   259MB
docker-multi               latest                 813c2fa9b114   3 minutes ago    156MB

Let's take another example:

# Interim phase
FROM python:3.9 as builder

RUN pip wheel --no-cache-dir --no-deps --wheel-dir /wheels jupyter pandas


# Final stage
FROM python:3.9-slim

WORKDIR /notebooks

COPY --from=builder /wheels /wheels
RUN pip install --no-cache /wheels/*

Mirror size comparison:

REPOSITORY                  TAG                   IMAGE ID       CREATED         SIZE
ds-multi                    latest                b4195deac742   2 minutes ago   357MB
ds-single                   latest                7c23c43aeda6   6 minutes ago   969MB

In short, multi-stage construction can reduce the size of your production image and help you save time and money. In addition, this will simplify your production container. Due to its small size and simplicity, it will have a relatively small attack surface.

2. Adjust the order of Dockerfile commands

Pay close attention to the order of your Dockerfile commands to take advantage of the layer cache.

Docker caches each step (or layer) in a specific docker file to speed up subsequent builds. When a step changes, not only the step, but also the cache of all subsequent steps will be abolished.

For example:

FROM python:3.9-slim

WORKDIR /app

COPY sample.py .

COPY requirements.txt .

RUN pip install -r /requirements.txt

In this Dockerfile, we copied the application code before installing the requirements. Now, every time we change sample Py, the package will be reinstalled by the build. This is very inefficient, especially when using the Docker container as a development environment. Therefore, it is critical to put frequently changing files at the end of Dockerfile.

You can also use Docker ignore file to exclude unnecessary files from being added to the docker build environment and final image, so as to help prevent unnecessary cache invalidation. More information will be mentioned later.

Therefore, in the Dockerfile above, you should put copy sample py . Move the command to the bottom as follows:

FROM python:3.9-slim

WORKDIR /app

COPY requirements.txt .

RUN pip install -r /requirements.txt

COPY sample.py .

be careful.

Always put the layers that may change as low as possible in the Dockerfile.
Execute multiple run apt get update, run apt get install and other commands together. (this also helps to reduce the size of the mirror, which will be mentioned later).
If you want to close the cache built by a Docker, you can add the -- no cache = true flag.

3. Use small Docker basic image

Smaller Docker images are more modular and secure. Smaller Docker basic images are slower to build, push and pull images, and they are often more secure because they only include the necessary libraries and system dependencies required to run applications.

Which Docker basic image should you use? There is no fixed answer to this. It depends on what you want to do. The following is the size comparison of various Docker basic images of Python.

REPOSITORY   TAG                 IMAGE ID       CREATED      SIZE
python       3.9.6-alpine3.14    f773016f760e   3 days ago   45.1MB
python       3.9.6-slim          907fc13ca8e7   3 days ago   115MB
python       3.9.6-slim-buster   907fc13ca8e7   3 days ago   115MB
python       3.9.6               cba42c28d9b8   3 days ago   886MB
python       3.9.6-buster        cba42c28d9b8   3 days ago   886MB

Although Alpine flavor based on Alpine Linux is the smallest, if you can't find a compiled binary file that can match it, it will often lead to an increase in construction time. As a result, you may eventually have to build binaries yourself, which may increase the size of the image (depending on the required system level dependencies) and the build time (because you have to compile from the source).

For why it is best not to use Alpine based basic images, please refer to the best Docker basic image for Python applications [2] and using Alpine can make Python Docker construction 50 times slower [3] to learn more about why it is best to avoid using Alpine based basic images.

In the final analysis, it's all about balance. If in doubt, start with * - slim flavor, especially in development mode, because you are building your application. You want to avoid having to constantly update the Dockerfile to install the necessary system level dependencies when adding new Python packages. When you enhance your application and Dockerfile for production, you may want to explore using Alpine to complete the final image of multi-stage construction.

In addition, don't forget to update your basic image regularly to improve security and performance. When a new version of a basic image is released, such as 3.9.6-slim -- > 3.9.7-slim, you should pull out the new image and update the container you are running to obtain all the latest security patches.

4. Minimize the number of layers

Try to use the RUN, COPY and ADD commands together as they create layers. Each layer increases the size of the images because they are cached. Therefore, as the number of layers increases, the image size will also increase.

You can test it with the docker history command.

docker images
REPOSITORY   TAG       IMAGE ID       CREATED          SIZE
dockerfile   latest    180f98132d02   51 seconds ago   259MB

docker history 180f98132d02

IMAGE          CREATED              CREATED BY                                      SIZE      COMMENT
180f98132d02   58 seconds ago       COPY . . # buildkit                             6.71kB    buildkit.dockerfile.v0
<missing>      58 seconds ago       RUN /bin/sh -c pip install -r requirements.t...   35.5MB    buildkit.dockerfile.v0
<missing>      About a minute ago   COPY requirements.txt . # buildkit              58B       buildkit.dockerfile.v0
<missing>      About a minute ago   WORKDIR /app
...

Please pay attention to the size. Only the RUN, COPY and ADD commands increase the size of the image. You can reduce the size of the image by merging commands as much as possible. For example:

RUN apt-get update
RUN apt-get install -y gcc

Can be combined into one RUN command:

RUN apt-get update && apt-get install -y gcc

Therefore, create a single layer instead of two, which reduces the size of the final image. Although reducing the number of layers is a good idea, more importantly, it is not a goal in itself, but a side effect of reducing the image size and construction time. In other words, instead of trying to optimize every command, you should pay more attention to the first three methods!!!

Multistage construction
Order of Dockerfile commands
And using a small base image.

be careful

RUN, COPY, and ADD all create layers
Each layer contains differences from the previous layer
Layers increase the size of the final image

Tips

Merge related commands
Delete unnecessary files in the RUN step during creation
Minimize the number of times to run apt get upgrade, as it upgrades all packages to the latest version.
For multi-stage builds, don't worry too much about over optimizing the temporary phase commands

Finally, for ease of reading, it is recommended to sort the multiline parameters alphabetically.

RUN apt-get update && apt-get install -y \
    git \
    gcc \
    matplotlib \
    pillow  \
    && rm -rf /var/lib/apt/lists/*

5. Use unprivileged containers

By default, Docker runs the container process as root within the container. However, this is a bad practice because processes running as root in the container also run as root in the Docker host.

Therefore, if an attacker obtains access to the container, they can obtain all root privileges and carry out some attacks on the Docker host, such as:

Copy sensitive information from the host's file system to the container
Execute remote command

To prevent this, be sure to run the container process as a non root user.

RUN addgroup --system app && adduser --system --group app

USER app

You can go further and remove shell permissions to ensure that there is no home directory.

RUN addgroup --gid 1001 --system app && \
    adduser --no-create-home --shell /bin/false --disabled-password --uid 1001 --system --group app

USER app

verification

docker run -i sample id

uid=1001(app) gid=1001(app) groups=1001(app)

Here, the application in the container runs under a non root user. However, remember that the Docker daemon and the container itself still run with root privileges.

Be sure to check the Docker daemon running as a non root user for help running daemons and containers as a non root user.

6. COPY is preferred over ADD

Unless you are sure you need the additional functions brought by ADD, please use COPY.

So what is the difference between COPY and ADD?

First, both commands allow you to copy files from a specific location to the Docker image.

ADD <src> <dest>
COPY <src> <dest>

Although they look the same, ADD has some additional features.

COPY is used to COPY local files or directories from the Docker host to the image.
ADD can be used for the same thing or to download external files. In addition, if you use compressed files (tar, gzip, bzip2, etc.) as parameters, ADD will automatically unzip the content to the specified location.

# Copy local files on the host to the destination
COPY /source/path  /destination/path
ADD /source/path  /destination/path

# Download external files and copy to destination
ADD http://external.file/url  /destination/path

# Copy and extract local compressed files
ADD source.file.tar.gz /destination/path

Finally, COPY is semantically clearer and easier to understand than ADD.

7. Cache the installation package to the Docker host

When a requirement file is changed, the image needs to be rebuilt to install a new package. The previous steps will be cached, as mentioned in minimizing the number of layers. Downloading all the packages when rebuilding the image will lead to a lot of network activity and take a lot of time. Each rebuild takes the same amount of time to download common packages in different builds.

Taking Python as an example, you can avoid this by mapping the pip cache directory to a directory on the host. Therefore, for each rebuild, the cached version will persist, which can improve the construction speed.

Add a volume as -v $home / cache/pip-Docker/:/root/. Cache / pip or as a mapping in the Docker Compose file.

The directories described above are for reference only. Make sure you map the cache directory instead of site packages (where the built-in package is located).

Moving the cache from the docker image to the host can save you space in the final image.

# Ignore

COPY requirements.txt .

RUN --mount=type=cache,target=/root/.cache/pip \
        pip install -r requirements.txt

# Ignore

8. Each container only runs one process

Why is it recommended to run only one process per container?

Let's assume that your application stack consists of two Web servers and a database. Although you can easily run all three from a container, you should run each service in a separate container to make it easier to reuse and extend each individual service.

Scalability - since each service is in a separate container, you can horizontally expand one of your web servers as needed to handle more traffic.
Reusability - maybe you have another service that needs a containerized database. You can simply reuse the same database container without carrying two unnecessary services.
The log coupling container makes logging more complex. (we'll discuss this in more detail later in this article.)
Portability and predictability - when fewer parts of the container are working, it is much easier to make security patches or debug problems.

9. Preferred array over String Syntax

You can use array (exec) or string (shell) format in your Dockerfiles

In Dockerfile, you can use CMD and ENTRYPOINT commands in array (exec) or string (shell) format

# Array (exec)
CMD ["gunicorn", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "main:app"]

# String (shell)
CMD "gunicorn -w 4 -k uvicorn.workers.UvicornWorker main:app"

Both are right and achieve almost the same thing; However, you should use the exec format as much as possible.

The following official document [4] from Docker:

Make sure you use the exec form of CMD and ENTRYPOINT in Dockerfile.
For example, use ["program", "arg1", "arg2"] instead of "program arg1 arg2". Using string form will cause Docker to use bash to run your process, and Bash can't handle signals correctly. Compose always uses JSON, so don't worry if you overwrite commands or entries in your compose file.

Therefore, since most shells do not process signals to child processes, CTRL-C (generate SIGTERM) may not stop a child process if you use shell format.

example:

FROM ubuntu:18.04

# BAD: string (shell) format
ENTRYPOINT top -d

# GOOD: array (exec) format
ENTRYPOINT ["top", "-d"]

The execution effect is the same in both cases. Note, however, that in the case of string (shell) format, CTRL-C does not kill the process. Instead, you will see ^ C^C^C^C^C^C^C^C^C^C ^ C.

Another note is that the string (shell) format carries the PID of the shell, not the process itself.

# Array format
root@18d8fd3fd4d2:/app# ps ax
  PID TTY      STAT   TIME COMMAND
    1 ?        Ss     0:00 python manage.py runserver 0.0.0.0:8000
    7 ?        Sl     0:02 /usr/local/bin/python manage.py runserver 0.0.0.0:8000
   25 pts/0    Ss     0:00 bash
  356 pts/0    R+     0:00 ps ax


# String format
root@ede24a5ef536:/app# ps ax
  PID TTY      STAT   TIME COMMAND
    1 ?        Ss     0:00 /bin/sh -c python manage.py runserver 0.0.0.0:8000
    8 ?        S      0:00 python manage.py runserver 0.0.0.0:8000
    9 ?        Sl     0:01 /usr/local/bin/python manage.py runserver 0.0.0.0:8000
   13 pts/0    Ss     0:00 bash
  342 pts/0    R+     0:00 ps ax

10. Understand the difference between ENTRYPOINT and CMD

Should I use ENTRYPOINT or CMD to run the container process? There are two ways to run commands in a container.

CMD ["gunicorn", "config.wsgi", "-b", "0.0.0.0:8000"]

# and

ENTRYPOINT ["gunicorn", "config.wsgi", "-b", "0.0.0.0:8000"]

The two essentially do the same thing: use the Gunicorn server in config WSGI starts the application and binds it to 0.0.0.0:8000.

CMD can be easily rewritten. If you run docker run < image_ name> uvicorn config. Asgi, the above CMD will be replaced by new parameters.

For example, uvicon config asgi. To override the entrypoint command, you must specify the -- entrypoint option.

docker run --entrypoint uvicorn config.asgi <image_name>

Here, obviously, we are covering the entry point. Therefore, it is recommended to use ENTRYPOINT instead of CMD to prevent accidental overwriting of commands.

They can also be used together. for instance

ENTRYPOINT ["gunicorn", "config.wsgi", "-w"]
CMD ["4"]

When used together like this, the command to start the container becomes:

gunicorn config.wsgi -w 4

As mentioned above, CMD can be easily rewritten. Therefore, CMD can be used to pass parameters to the ENTRYPOINT command. For example, it is easy to change the number of workers, like this:

docker run <image_name> 6

In this way, there will be six Gunicorn workers startup containers instead of the default four.

11. Add health check HEALTHCHECK

Use HEALTHCHECK to determine whether the process running in the container is not only started and running, but also "healthy".

Docker discloses an API to check the status of the process running in the container. It provides more information than whether the process is "running", because "running" covers "it is running", "still starting", and even "falling into some infinite loop error state". You can interact with this API through the 'healthcheck' [5] command.

For example, if you are providing a Web application, you can use the following to determine whether the / endpoint is started and can process service requests:

HEALTHCHECK CMD curl --fail http://localhost:8000 || exit 1

If you run docker ps, you can see the status of HEALTHCHECK.

Examples of health

CONTAINER ID   IMAGE         COMMAND                  CREATED          STATUS                            PORTS                                       NAMES
09c2eb4970d4   healthcheck   "python manage.py ru..."   10 seconds ago   Up 8 seconds (health: starting)   0.0.0.0:8000->8000/tcp, :::8000->8000/tcp   xenodochial_clarke

Unhealthy examples

CONTAINER ID   IMAGE         COMMAND                  CREATED              STATUS                          PORTS                                       NAMES
09c2eb4970d4   healthcheck   "python manage.py ru..."   About a minute ago   Up About a minute (unhealthy)   0.0.0.0:8000->8000/tcp, :::8000->8000/tcp   xenodochial_clarke

You can go further, set up a custom endpoint for health check only, and then configure HEALTHCHECK to test the returned data.

For example, if the endpoint returns a JSON response of {"ping": "pong"}, you can instruct HEALTHCHECK to validate the response body.

The following is how to view the health check status using docker inspect:

Part of the output is omitted here.

❯ docker inspect --format "{{json .State.Health }}" ab94f2ac7889
{
  "Status": "healthy",
  "FailingStreak": 0,
  "Log": [
    {
      "Start": "2021-09-28T15:22:57.5764644Z",
      "End": "2021-09-28T15:22:57.7825527Z",
      "ExitCode": 0,
      "Output": "..."

You can also add health checks to Docker Compose files:

version: "3.8"

services:
  web:
    build: .
    ports:
      - '8000:8000'
    healthcheck:
      test: curl --fail http://localhost:8000 || exit 1
      interval: 10s
      timeout: 10s
      start_period: 10s
      retries: 3

Options:

Test: the command to test.
Interval: the interval to be tested - that is, every x time unit tested.
timeout: the maximum time to wait for a response.
start_period: when to start the health examination. It can be used when performing other tasks before the container is ready, such as running a migration.
Retries: the maximum number of retries before specifying the test as failed.

If you use choreography tools other than Docker Swarm (such as Kubernetes or AWS ECS), they are likely to have their own internal systems to handle health checks. Before adding the HEALTHCHECK directive, refer to the documentation for the specific tool.

Docker mirroring best practices

1. Docker image version

Whenever possible, avoid using the mirror image of the latest label.

If you rely on the latest tag (this is not a real "tag", because it is applied by default when the image has no explicit tag), you can't judge which version of your code is running according to the image tag.

If you want to roll back, it becomes very difficult and easy to be overwritten (whether accidental or malicious). Tags, like your infrastructure and deployment, should be immutable.

Therefore, no matter how you treat your internal image, you should not use the latest label on the basic image, because you may inadvertently deploy a new version with destructive changes to production.

For internal images, descriptive labels should be used to make it easier to identify which version of code is running, handle rollback, and avoid naming conflicts. For example, you can use the following descriptors to form a tag.

time stamp
Docker image ID
Git commit hash value
Semantic version

For more choices, you can also refer to the answer in Stack Overflow question [6] "Properly Versioning Docker Images".

for instance

docker build -t web-prod-b25a262-1.0.0 .

Here, we use the following content to form a label

Project Name: web
Environment name: prod
Git commit short hash: b25a262 (obtained by the command git Rev parse -- short head)
Semantics version: 1.0.0

It is important to choose a labeling scheme and keep it consistent. Since commit hashes can easily associate image tags with code, it is recommended to include them in your tag scheme.

2. Do not store confidential information in the image

Secrets are sensitive information, such as passwords, database credentials, SSH keys, tokens and TLS certificates. This information should not be put into your image without encryption, because unauthorized users can extract the key by checking these layers if they have access to the image.

Therefore, do not add plaintext keys to Docker files, especially when you push images to public warehouses like Docker Hub!!

FROM python:3.9-slim

ENV DATABASE_PASSWORD "SuperSecretSauce"

Instead, they should be injected

Environment variables (at run time)
Build time parameters (at build time)
Coordination tools such as Docker Swarm (via Docker secrets) or Kubernetes (via Kubernetes secrets).

In addition, you can also pass in your Common key files and folders are added to the docker ignore file to help prevent key disclosure.

**/.env
**/.aws
**/.ssh

Finally, be clear about which files will be copied into the image, rather than copying all files recursively.

# Bad practice
COPY . .

# Good practice
COPY ./app.py .

Explicit approaches also help limit cache corruption.

environment variable

You can pass keys through environment variables, but they will be visible in all child processes, linked containers and logs, and docker inspect. It is also difficult to update them.

docker run --detach --env "DATABASE_PASSWORD=SuperSecretSauce" python: 3.9-slim

b25a262f870eb0fdbf03c666e7fcf18f9664314b79ad58bc7618ea3445e39239

docker inspect --format='{{range .Config.Env}}{{println .}}{{end}}' b25a262f870eb0fdbf03c666e7fcf18f9664314b79ad58bc7618ea3445e39239

DATABASE_PASSWORD=SuperSecretSauce
PATH=/usr/local/bin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
LANG=C.UTF-8
GPG_KEY=E3FF2839C048B25C084DEBE9B26995E310250568
python_version=3.9.7
python_pip_version=21.2.4
python_setuptools_version=57.5.0
python_get_pip_url=https://github.com/pypa/get-pip/raw/c20b0cfd643cd4a19246ccf204e2997af70f6b21/public/get-pip.py
PYTHON_GET_PIP_SHA256=fa6f3fb93cce234cd4e8dd2beb54a51ab9c247653b52855a48dd44e6b21ff28b

This is the most direct key management method. Although it is not the most secure, it will keep honest people honest, because it provides a thin protective layer to help keep the key from being found by curious wandering eyes.

Using shared volumes to pass keys is a better solution, but they should be encrypted through Vault or AWS key management service (KMS) because they are saved to disk.

Build time parameters

You can use build time parameters to pass keys at build time, but these keys are visible to those who can access the image through docker history.

example

FROM python:3.9-slim


ARG DATABASE_PASSWORD

structure

docker build --build-arg "DATABASE_PASSWORD=SuperSecretSauce" .

If you only need to temporarily use the key as part of the build. For example, the SSH key used to clone a private repo or download a private package. You should use multi-stage construction because the builder's history will be ignored by the temporary stage.

# Interim phase
FROM python:3.9-slim as builder

# Key parameters
arg ssh_private_key

# Install git
RUN apt-get update && (function apt-get update). 
    apt-get install -y --no-install-recommends git

# Clone repo Using ssh key
RUN mkdir -p /root/.ssh/ && \\
    echo "${PRIVATE_SSH_KEY}" > /root/.ssh/id_rsa
RUN touch /root/.ssh/known_hosts & &
    ssh-keyscan bitbucket.org >> /root/.ssh/known_hosts
RUN git clone git@github.com:testdrivenio/not-real.git


# final phase
FROM python:3.9-slim

working directory /app

# Copy a version library from a temporary image
COPY --from=builder /your-repo /app/your-repo

Multistage builds retain only the history of the final mirror. You can use this function for permanent keys required by your application, such as database credentials.

You can also use the new -- secret option in docker build to pass keys to the Docker image. These keys will not be stored in the image.

# "docker_is_awesome" > secrets.txt

FROM alpine

# Displays the key from the default key location.
RUN --mount=type=secret,id=mysecret cat /run/secrets/mysecret

This will load secrets Txt file.

Build mirror

docker build --no-cache --progress=plain --secret id=mysecret,src=secrets.txt .

# output
...
#4 [1/2] FROM docker.io/library/alpine
#4 sha256:665ba8b2cdc0cb0200e2a42a6b3c0f8f684089f4cd1b81494fbb9805879120f7
#4 cached

#5 [2/2] RUN --mount=type=secret,id=mysecret cat /run/secrets/myecret
#5 sha256:75601a522ebe80ada66dedd9dd86772ca932d30d7e1b11bba94c04aa55c237de
#5 0.635 docker_is_awesome#5 DONE 0.7s

#6 export to image

Finally, check the history to see if the key has been compromised.

❯ docker history 49574a19241c
IMAGE          CREATED         CREATED BY                                      SIZE      COMMENT
49574a19241c   5 minutes ago   CMD ["/bin/sh"]                                 0B        buildkit.dockerfile.v0
<missing>      5 minutes ago   RUN /bin/sh -c cat /run/secrets/mysecret # b...   0B        buildkit.dockerfile.v0
<missing>      4 weeks ago     /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B
<missing>      4 weeks ago     /bin/sh -c #(nop) ADD file:aad4290d27580cc1a...   5.6MB

Docker key

If you are using Docker Swarm, you can use Docker secrets to manage keys.

For example, start Docker Swarm mode.

docker swarm init

Create a docker key.

echo "supersecretpassword" | docker secret create postgres_password -
qdqmbpizeef0lfhyttxqfbty0

docker secret ls
ID                          NAME                DRIVER    CREATED         UPDATED
qdqmbpizeef0lfhyttxqfbty0   postgres_password             4 seconds ago   4 seconds ago

When a container is given access to the above key, it will be mounted in / run/secrets/postgres_password. This file will contain the actual value of the plaintext key.

Use other orchestration tools?

Use AWS Secrets Manager key and Kubernetes key [7]
DigitalOcean Kubernetes - recommended steps for protecting DigitalOcean Kubernetes clusters [8]
Google Kubernetes engine - use key manager with other products [9]
Nomad - Vault integration and retrieval of dynamic keys [10]

3. Use Docker ignore file

It has been mentioned several times before Dockeignore file. This file is used to specify the files and folders you do not want to be added to the initial build context sent to the Docker daemon, which will build your image. In other words, you can use it to define the build environment you need.

When a Docker image is built, the entire Docker context - that is, the root of your project is sent to the Docker daemon before the COPY or ADD command is executed.

This can be quite resource intensive, especially if your project has many dependencies, a large number of data files, or build artifacts.

In addition, when Docker CLI and daemon are not on the same machine. For example, if the daemon is executed on a remote machine, you should pay more attention to the size of the build environment.

You should be here What is added to the docker ignore file?

Temporary files and folders
Build log
Local secrets
Local development files, such as docker compose yml
Version control folders, such as ". git", "hg" and ". vscode"

example:

**/.git
**/.gitignore
**/.vscode
**/coverage
**/.env
**/.aws
**/.ssh
Dockerfile
README.md
docker-compose.yml
**/.DS_Store
**/venv
**/env

In short, the structure is reasonable Docker ignore can help

Reduce the size of Docker image
Accelerate the build process
Prevent unnecessary cache invalidation
Prevent disclosure

4. Check and scan your Dockerfile and image

Linting is the process of checking the source code for programming and style errors and bad practices that may lead to potential defects. Like programming languages, static files can also be lint. Especially for your dockerfiles, linter can help ensure their maintainability, avoid deprecating syntax, and follow best practices. Grooming images should be a standard part of the CI pipeline.

Hadolint[11] is the most popular dockerfile linker:

hadolint Dockerfile

Dockerfile:1 DL3006 warning: Always tag the version of an image explicitly
Dockerfile:7 DL3042 warning: Avoid the use of cache directory with pip. Use `pip install --no-cache-dir <package>`
Dockerfile:9 DL3059 info: Multiple consecutive `RUN` instructions. Consider consolidation.
Dockerfile:17 DL3025 warning: Use arguments JSON notation for CMD and ENTRYPOINT arguments

This is an online link to Hadolint https://hadolint.github.io/hadolint/ You can also install the VS Code plug-in [12]

You can use Dockerfile in conjunction with scanning images and container vulnerabilities.

Here are some influential image scanning tools:

Snyk[13] is the exclusive provider of Docker local vulnerability scanning. You can scan images using the docker scan CLI command.
Trivy[14] can be used to scan container images, file systems, git repositories, and other configuration files.
Clair[15] is an open source project for static analysis of vulnerabilities in application containers.
Anchor [16] is an open source project that provides centralized services for container image inspection, analysis and authentication.

In short, lint and scan your Dockerfile and images to find any potential problems that deviate from best practices.

5. Signing and verifying images

How do you know that the image used to run production code has not been tampered with?

Tampering can be achieved by man in the middle (MITM) attack or complete destruction of the registry. Docker content trust (DCT) can sign and verify docker images from remote registries.

To verify the integrity and authenticity of the image, set the following environment variables.

DOCKER_CONTENT_TRUST=1

Now, if you try to pull an unsigned image, you will receive the following error.

Error: remote trust data does not exist for docker.io/namespace/unsigned-image:
notary.docker.io does not have trust data for docker.io/namespace/unsigned-image

You can learn about signing images from signing image documents using Docker content trust.

When downloading images from Docker Hub, make sure to use official images or verified images from trusted sources. Larger teams should use their own internal private container repository

6. Set memory and CPU limits

It is a good idea to limit the memory usage of Docker containers, especially when you run multiple containers on one machine. This prevents any one container from using all available memory, thereby weakening the functionality of other containers.

The easiest way to limit memory usage is to use -- memory and -- cpu options in the Docker cli.

docker run --cpus=2 -m 512m nginx

The above command limits the use of the container to 2 CPU s and 512 megabytes of memory.

You can do the same thing in the Docker Compose file, like this.

version: "3.9"
services:
  redis:
    image: redis:alpine
    deploy:
      resources:
        limits:
          cpus: 2
          memory: 512M
        reservations:
          cpus: 1
          memory: 256M

Notice the reservations field. It is used to set the soft limit. When the host has insufficient memory or CPU resources, it will take priority.

Other relevant resources

Runtime options with memory, CPU, and GPU: https://docs.docker.com/config/containers/resource_constraints/
Resource limitations of Docker Compose: https://docs.docker.com/compose/compose-file/compose-file-v3/#resources

summary

The above are the 17 best practices introduced in this article. Mastering these best practices will make your Dockerfile and Docker Image compact, clean and secure.

This article is from Docker Best Practices for Python Developers[17].

reference material

[1]multi-stage builds: https://docs.docker.com/develop/develop-images/multistage-build/

[2] The best Docker base image for Python applications: https://pythonspeed.com/articles/base-image-python-docker-images/

[3] Using Alpine can make Python Docker build 50 times slower: https://pythonspeed.com/articles/alpine-docker-python/

[4]Docker's official documents: https://docs.docker.com/compose/faq/#why-do-my-services-take-10-seconds-to-recreate-or-stop

[5]HEALTHCHECK: https://docs.docker.com/engine/reference/builder/#healthcheck

[6] Question: https://stackoverflow.com/a/56213290/1799408

[7] Use AWS Secrets Manager key and Kubernetes key: https://docs.aws.amazon.com/eks/latest/userguide/manage-secrets.html

[8] Recommended steps for protecting digital ocean kubernetes cluster: https://www.digitalocean.com/community/tutorials/recommended-steps-to-secure-a-digitalocean-kubernetes-cluster

[9] Use key manager with other products: https://cloud.google.com/secret-manager/docs/using-other-products#google-kubernetes-engine

[10]Vault integration and retrieval of dynamic keys: https://learn.hashicorp.com/tutorials/nomad/vault-postgres?in=nomad/integrate-vault

[11]Hadolint: https://github.com/hadolint/hadolint

[12] Plug in: https://marketplace.visualstudio.com/items?itemName=exiasr.hadolint

[13]Snyk: https://docs.docker.com/engine/scan/

[14]Trivy: https://aquasecurity.github.io/trivy/

[15]Clair: https://github.com/quay/clair

[16]Anchore: https://github.com/anchore/anchore-engine

[17]Docker Best Practices for Python Developers: https://testdriven.io/blog/docker-best-practices/

Programmer Think

You must understand these 17 Docker best practices!

Article catalog

Dockerfile best practices

1. Use multi-stage construction

2. Adjust the order of Dockerfile commands

3. Use small Docker basic image

4. Minimize the number of layers

be careful

Tips

5. Use unprivileged containers

6. COPY is preferred over ADD

7. Cache the installation package to the Docker host

8. Each container only runs one process

9. Preferred array over String Syntax

10. Understand the difference between ENTRYPOINT and CMD

11. Add health check HEALTHCHECK

Docker mirroring best practices

1. Docker image version

2. Do not store confidential information in the image

environment variable

Build time parameters

Docker key

3. Use Docker ignore file

4. Check and scan your Dockerfile and image

5. Signing and verifying images

6. Set memory and CPU limits

summary

reference material

Hot Topics