This article shares some best practices to follow when writing Dockerfiles and using Docker. The space is long. It is recommended to collect it first and read it slowly to ensure that it will be very rewarding after reading it.
Article catalog
Dockerfile best practices
- Use multi-stage build
- Adjust the order of Dockerfile commands
- Using a small Docker base image
- Minimize the number of layers
- Use unprivileged containers
- COPY is preferred over ADD
- Cache Python packages on Docker host
- Each container runs only one process
- Array syntax is preferred over String Syntax
- Understand the difference between ENTRYPOINT and CMD
- Add health check HEALTHCHECK
Docker mirroring best practices
- Version of Docker image
- Do not store keys in images
- use. Docker ignore file
- Check and scan your Docker files and images
- Sign and verify images
- Set memory and CPU limits
Dockerfile best practices
1. Use multi-stage construction
Take advantage of multi-stage construction to create a leaner and more secure Docker image. Multi stage builds (1) allows you to divide your Dockerfile into several stages.
For example, you can have a stage for compiling and building your application, which can then be copied to subsequent stages. Since only the last stage is used to create the image, the dependencies and tools related to building the application are discarded, leaving a lean, modular, production ready image.
Web development example:
# Interim phase FROM python:3.9-slim as builder WORKDIR /app RUN apt-get update && \ apt-get install -y --no-install-recommends gcc COPY requirements.txt . RUN pip wheel --no-cache-dir --no-deps --wheel-dir /app/wheels -r requirements.txt # Final stage FROM python:3.9-slim WORKDIR /app COPY --from=builder /app/wheels /wheels COPY --from=builder /app/requirements.txt . RUN pip install --no-cache /wheels/*
In this example, the GCC compiler is required to install some Python packages, so we added a temporary, build time phase to handle the build phase.
Since the final runtime image does not contain GCC, it is lighter and more secure. Mirror size comparison:
REPOSITORY TAG IMAGE ID CREATED SIZE docker-single latest 8d6b6a4d7fb6 16 seconds ago 259MB docker-multi latest 813c2fa9b114 3 minutes ago 156MB
Let's take another example:
# Interim phase FROM python:3.9 as builder RUN pip wheel --no-cache-dir --no-deps --wheel-dir /wheels jupyter pandas # Final stage FROM python:3.9-slim WORKDIR /notebooks COPY --from=builder /wheels /wheels RUN pip install --no-cache /wheels/*
Mirror size comparison:
REPOSITORY TAG IMAGE ID CREATED SIZE ds-multi latest b4195deac742 2 minutes ago 357MB ds-single latest 7c23c43aeda6 6 minutes ago 969MB
In short, multi-stage construction can reduce the size of your production image and help you save time and money. In addition, this will simplify your production container. Due to its small size and simplicity, it will have a relatively small attack surface.
2. Adjust the order of Dockerfile commands
Pay close attention to the order of your Dockerfile commands to take advantage of the layer cache.
Docker caches each step (or layer) in a specific docker file to speed up subsequent builds. When a step changes, not only the step, but also the cache of all subsequent steps will be abolished.
For example:
FROM python:3.9-slim WORKDIR /app COPY sample.py . COPY requirements.txt . RUN pip install -r /requirements.txt
In this Dockerfile, we copied the application code before installing the requirements. Now, every time we change sample Py, the package will be reinstalled by the build. This is very inefficient, especially when using the Docker container as a development environment. Therefore, it is critical to put frequently changing files at the end of Dockerfile.
You can also use Docker ignore file to exclude unnecessary files from being added to the docker build environment and final image, so as to help prevent unnecessary cache invalidation. More information will be mentioned later.
Therefore, in the Dockerfile above, you should put copy sample py . Move the command to the bottom as follows:
FROM python:3.9-slim WORKDIR /app COPY requirements.txt . RUN pip install -r /requirements.txt COPY sample.py .
be careful.
- Always put the layers that may change as low as possible in the Dockerfile.
- Execute multiple run apt get update, run apt get install and other commands together. (this also helps to reduce the size of the mirror, which will be mentioned later).
- If you want to close the cache built by a Docker, you can add the -- no cache = true flag.
3. Use small Docker basic image
Smaller Docker images are more modular and secure. Smaller Docker basic images are slower to build, push and pull images, and they are often more secure because they only include the necessary libraries and system dependencies required to run applications.
Which Docker basic image should you use? There is no fixed answer to this. It depends on what you want to do. The following is the size comparison of various Docker basic images of Python.
REPOSITORY TAG IMAGE ID CREATED SIZE python 3.9.6-alpine3.14 f773016f760e 3 days ago 45.1MB python 3.9.6-slim 907fc13ca8e7 3 days ago 115MB python 3.9.6-slim-buster 907fc13ca8e7 3 days ago 115MB python 3.9.6 cba42c28d9b8 3 days ago 886MB python 3.9.6-buster cba42c28d9b8 3 days ago 886MB
Although Alpine flavor based on Alpine Linux is the smallest, if you can't find a compiled binary file that can match it, it will often lead to an increase in construction time. As a result, you may eventually have to build binaries yourself, which may increase the size of the image (depending on the required system level dependencies) and the build time (because you have to compile from the source).
For why it is best not to use Alpine based basic images, please refer to the best Docker basic image for Python applications [2] and using Alpine can make Python Docker construction 50 times slower [3] to learn more about why it is best to avoid using Alpine based basic images.
In the final analysis, it's all about balance. If in doubt, start with * - slim flavor, especially in development mode, because you are building your application. You want to avoid having to constantly update the Dockerfile to install the necessary system level dependencies when adding new Python packages. When you enhance your application and Dockerfile for production, you may want to explore using Alpine to complete the final image of multi-stage construction.
In addition, don't forget to update your basic image regularly to improve security and performance. When a new version of a basic image is released, such as 3.9.6-slim -- > 3.9.7-slim, you should pull out the new image and update the container you are running to obtain all the latest security patches.
4. Minimize the number of layers
Try to use the RUN, COPY and ADD commands together as they create layers. Each layer increases the size of the images because they are cached. Therefore, as the number of layers increases, the image size will also increase.
You can test it with the docker history command.
docker images REPOSITORY TAG IMAGE ID CREATED SIZE dockerfile latest 180f98132d02 51 seconds ago 259MB docker history 180f98132d02 IMAGE CREATED CREATED BY SIZE COMMENT 180f98132d02 58 seconds ago COPY . . # buildkit 6.71kB buildkit.dockerfile.v0 <missing> 58 seconds ago RUN /bin/sh -c pip install -r requirements.t... 35.5MB buildkit.dockerfile.v0 <missing> About a minute ago COPY requirements.txt . # buildkit 58B buildkit.dockerfile.v0 <missing> About a minute ago WORKDIR /app ...
Please pay attention to the size. Only the RUN, COPY and ADD commands increase the size of the image. You can reduce the size of the image by merging commands as much as possible. For example:
RUN apt-get update RUN apt-get install -y gcc
Can be combined into one RUN command:
RUN apt-get update && apt-get install -y gcc
Therefore, create a single layer instead of two, which reduces the size of the final image. Although reducing the number of layers is a good idea, more importantly, it is not a goal in itself, but a side effect of reducing the image size and construction time. In other words, instead of trying to optimize every command, you should pay more attention to the first three methods!!!
- Multistage construction
- Order of Dockerfile commands
- And using a small base image.
be careful
- RUN, COPY, and ADD all create layers
- Each layer contains differences from the previous layer
- Layers increase the size of the final image
Tips
- Merge related commands
- Delete unnecessary files in the RUN step during creation
- Minimize the number of times to run apt get upgrade, as it upgrades all packages to the latest version.
- For multi-stage builds, don't worry too much about over optimizing the temporary phase commands
Finally, for ease of reading, it is recommended to sort the multiline parameters alphabetically.
RUN apt-get update && apt-get install -y \ git \ gcc \ matplotlib \ pillow \ && rm -rf /var/lib/apt/lists/*
5. Use unprivileged containers
By default, Docker runs the container process as root within the container. However, this is a bad practice because processes running as root in the container also run as root in the Docker host.
Therefore, if an attacker obtains access to the container, they can obtain all root privileges and carry out some attacks on the Docker host, such as:
- Copy sensitive information from the host's file system to the container
- Execute remote command
To prevent this, be sure to run the container process as a non root user.
RUN addgroup --system app && adduser --system --group app USER app
You can go further and remove shell permissions to ensure that there is no home directory.
RUN addgroup --gid 1001 --system app && \ adduser --no-create-home --shell /bin/false --disabled-password --uid 1001 --system --group app USER app
verification
docker run -i sample id uid=1001(app) gid=1001(app) groups=1001(app)
Here, the application in the container runs under a non root user. However, remember that the Docker daemon and the container itself still run with root privileges.
Be sure to check the Docker daemon running as a non root user for help running daemons and containers as a non root user.
6. COPY is preferred over ADD
Unless you are sure you need the additional functions brought by ADD, please use COPY.
So what is the difference between COPY and ADD?
First, both commands allow you to copy files from a specific location to the Docker image.
ADD <src> <dest> COPY <src> <dest>
Although they look the same, ADD has some additional features.
- COPY is used to COPY local files or directories from the Docker host to the image.
- ADD can be used for the same thing or to download external files. In addition, if you use compressed files (tar, gzip, bzip2, etc.) as parameters, ADD will automatically unzip the content to the specified location.
# Copy local files on the host to the destination COPY /source/path /destination/path ADD /source/path /destination/path # Download external files and copy to destination ADD http://external.file/url /destination/path # Copy and extract local compressed files ADD source.file.tar.gz /destination/path
Finally, COPY is semantically clearer and easier to understand than ADD.
7. Cache the installation package to the Docker host
When a requirement file is changed, the image needs to be rebuilt to install a new package. The previous steps will be cached, as mentioned in minimizing the number of layers. Downloading all the packages when rebuilding the image will lead to a lot of network activity and take a lot of time. Each rebuild takes the same amount of time to download common packages in different builds.
Taking Python as an example, you can avoid this by mapping the pip cache directory to a directory on the host. Therefore, for each rebuild, the cached version will persist, which can improve the construction speed.
Add a volume as -v $home / cache/pip-Docker/:/root/. Cache / pip or as a mapping in the Docker Compose file.
The directories described above are for reference only. Make sure you map the cache directory instead of site packages (where the built-in package is located).
Moving the cache from the docker image to the host can save you space in the final image.
# Ignore COPY requirements.txt . RUN --mount=type=cache,target=/root/.cache/pip \ pip install -r requirements.txt # Ignore
8. Each container only runs one process
Why is it recommended to run only one process per container?
Let's assume that your application stack consists of two Web servers and a database. Although you can easily run all three from a container, you should run each service in a separate container to make it easier to reuse and extend each individual service.
- Scalability - since each service is in a separate container, you can horizontally expand one of your web servers as needed to handle more traffic.
- Reusability - maybe you have another service that needs a containerized database. You can simply reuse the same database container without carrying two unnecessary services.
- The log coupling container makes logging more complex. (we'll discuss this in more detail later in this article.)
- Portability and predictability - when fewer parts of the container are working, it is much easier to make security patches or debug problems.
9. Preferred array over String Syntax
You can use array (exec) or string (shell) format in your Dockerfiles
In Dockerfile, you can use CMD and ENTRYPOINT commands in array (exec) or string (shell) format
# Array (exec) CMD ["gunicorn", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "main:app"] # String (shell) CMD "gunicorn -w 4 -k uvicorn.workers.UvicornWorker main:app"
Both are right and achieve almost the same thing; However, you should use the exec format as much as possible.
The following official document [4] from Docker:
- Make sure you use the exec form of CMD and ENTRYPOINT in Dockerfile.
- For example, use ["program", "arg1", "arg2"] instead of "program arg1 arg2". Using string form will cause Docker to use bash to run your process, and Bash can't handle signals correctly. Compose always uses JSON, so don't worry if you overwrite commands or entries in your compose file.
Therefore, since most shells do not process signals to child processes, CTRL-C (generate SIGTERM) may not stop a child process if you use shell format.
example:
FROM ubuntu:18.04 # BAD: string (shell) format ENTRYPOINT top -d # GOOD: array (exec) format ENTRYPOINT ["top", "-d"]
The execution effect is the same in both cases. Note, however, that in the case of string (shell) format, CTRL-C does not kill the process. Instead, you will see ^ C^C^C^C^C^C^C^C^C^C ^ C.
Another note is that the string (shell) format carries the PID of the shell, not the process itself.
# Array format root@18d8fd3fd4d2:/app# ps ax PID TTY STAT TIME COMMAND 1 ? Ss 0:00 python manage.py runserver 0.0.0.0:8000 7 ? Sl 0:02 /usr/local/bin/python manage.py runserver 0.0.0.0:8000 25 pts/0 Ss 0:00 bash 356 pts/0 R+ 0:00 ps ax # String format root@ede24a5ef536:/app# ps ax PID TTY STAT TIME COMMAND 1 ? Ss 0:00 /bin/sh -c python manage.py runserver 0.0.0.0:8000 8 ? S 0:00 python manage.py runserver 0.0.0.0:8000 9 ? Sl 0:01 /usr/local/bin/python manage.py runserver 0.0.0.0:8000 13 pts/0 Ss 0:00 bash 342 pts/0 R+ 0:00 ps ax
10. Understand the difference between ENTRYPOINT and CMD
Should I use ENTRYPOINT or CMD to run the container process? There are two ways to run commands in a container.
CMD ["gunicorn", "config.wsgi", "-b", "0.0.0.0:8000"] # and ENTRYPOINT ["gunicorn", "config.wsgi", "-b", "0.0.0.0:8000"]
The two essentially do the same thing: use the Gunicorn server in config WSGI starts the application and binds it to 0.0.0.0:8000.
CMD can be easily rewritten. If you run docker run < image_ name> uvicorn config. Asgi, the above CMD will be replaced by new parameters.
For example, uvicon config asgi. To override the entrypoint command, you must specify the -- entrypoint option.
docker run --entrypoint uvicorn config.asgi <image_name>
Here, obviously, we are covering the entry point. Therefore, it is recommended to use ENTRYPOINT instead of CMD to prevent accidental overwriting of commands.
They can also be used together. for instance
ENTRYPOINT ["gunicorn", "config.wsgi", "-w"] CMD ["4"]
When used together like this, the command to start the container becomes:
gunicorn config.wsgi -w 4
As mentioned above, CMD can be easily rewritten. Therefore, CMD can be used to pass parameters to the ENTRYPOINT command. For example, it is easy to change the number of workers, like this:
docker run <image_name> 6
In this way, there will be six Gunicorn workers startup containers instead of the default four.
11. Add health check HEALTHCHECK
Use HEALTHCHECK to determine whether the process running in the container is not only started and running, but also "healthy".
Docker discloses an API to check the status of the process running in the container. It provides more information than whether the process is "running", because "running" covers "it is running", "still starting", and even "falling into some infinite loop error state". You can interact with this API through the 'healthcheck' [5] command.
For example, if you are providing a Web application, you can use the following to determine whether the / endpoint is started and can process service requests:
HEALTHCHECK CMD curl --fail http://localhost:8000 || exit 1
If you run docker ps, you can see the status of HEALTHCHECK.
Examples of health
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 09c2eb4970d4 healthcheck "python manage.py ru..." 10 seconds ago Up 8 seconds (health: starting) 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp xenodochial_clarke
Unhealthy examples
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 09c2eb4970d4 healthcheck "python manage.py ru..." About a minute ago Up About a minute (unhealthy) 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp xenodochial_clarke
You can go further, set up a custom endpoint for health check only, and then configure HEALTHCHECK to test the returned data.
For example, if the endpoint returns a JSON response of {"ping": "pong"}, you can instruct HEALTHCHECK to validate the response body.
The following is how to view the health check status using docker inspect:
Part of the output is omitted here.
❯ docker inspect --format "{{json .State.Health }}" ab94f2ac7889 { "Status": "healthy", "FailingStreak": 0, "Log": [ { "Start": "2021-09-28T15:22:57.5764644Z", "End": "2021-09-28T15:22:57.7825527Z", "ExitCode": 0, "Output": "..."
You can also add health checks to Docker Compose files:
version: "3.8" services: web: build: . ports: - '8000:8000' healthcheck: test: curl --fail http://localhost:8000 || exit 1 interval: 10s timeout: 10s start_period: 10s retries: 3
Options:
- Test: the command to test.
- Interval: the interval to be tested - that is, every x time unit tested.
- timeout: the maximum time to wait for a response.
- start_period: when to start the health examination. It can be used when performing other tasks before the container is ready, such as running a migration.
- Retries: the maximum number of retries before specifying the test as failed.
If you use choreography tools other than Docker Swarm (such as Kubernetes or AWS ECS), they are likely to have their own internal systems to handle health checks. Before adding the HEALTHCHECK directive, refer to the documentation for the specific tool.
Docker mirroring best practices
1. Docker image version
Whenever possible, avoid using the mirror image of the latest label.
If you rely on the latest tag (this is not a real "tag", because it is applied by default when the image has no explicit tag), you can't judge which version of your code is running according to the image tag.
If you want to roll back, it becomes very difficult and easy to be overwritten (whether accidental or malicious). Tags, like your infrastructure and deployment, should be immutable.
Therefore, no matter how you treat your internal image, you should not use the latest label on the basic image, because you may inadvertently deploy a new version with destructive changes to production.
For internal images, descriptive labels should be used to make it easier to identify which version of code is running, handle rollback, and avoid naming conflicts. For example, you can use the following descriptors to form a tag.
- time stamp
- Docker image ID
- Git commit hash value
- Semantic version
For more choices, you can also refer to the answer in Stack Overflow question [6] "Properly Versioning Docker Images".
for instance
docker build -t web-prod-b25a262-1.0.0 .
Here, we use the following content to form a label
- Project Name: web
- Environment name: prod
- Git commit short hash: b25a262 (obtained by the command git Rev parse -- short head)
- Semantics version: 1.0.0
It is important to choose a labeling scheme and keep it consistent. Since commit hashes can easily associate image tags with code, it is recommended to include them in your tag scheme.
2. Do not store confidential information in the image
Secrets are sensitive information, such as passwords, database credentials, SSH keys, tokens and TLS certificates. This information should not be put into your image without encryption, because unauthorized users can extract the key by checking these layers if they have access to the image.
Therefore, do not add plaintext keys to Docker files, especially when you push images to public warehouses like Docker Hub!!
FROM python:3.9-slim ENV DATABASE_PASSWORD "SuperSecretSauce"
Instead, they should be injected
- Environment variables (at run time)
- Build time parameters (at build time)
- Coordination tools such as Docker Swarm (via Docker secrets) or Kubernetes (via Kubernetes secrets).
In addition, you can also pass in your Common key files and folders are added to the docker ignore file to help prevent key disclosure.
**/.env **/.aws **/.ssh
Finally, be clear about which files will be copied into the image, rather than copying all files recursively.
# Bad practice COPY . . # Good practice COPY ./app.py .
Explicit approaches also help limit cache corruption.
environment variable
You can pass keys through environment variables, but they will be visible in all child processes, linked containers and logs, and docker inspect. It is also difficult to update them.
docker run --detach --env "DATABASE_PASSWORD=SuperSecretSauce" python: 3.9-slim b25a262f870eb0fdbf03c666e7fcf18f9664314b79ad58bc7618ea3445e39239 docker inspect --format='{{range .Config.Env}}{{println .}}{{end}}' b25a262f870eb0fdbf03c666e7fcf18f9664314b79ad58bc7618ea3445e39239 DATABASE_PASSWORD=SuperSecretSauce PATH=/usr/local/bin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin LANG=C.UTF-8 GPG_KEY=E3FF2839C048B25C084DEBE9B26995E310250568 python_version=3.9.7 python_pip_version=21.2.4 python_setuptools_version=57.5.0 python_get_pip_url=https://github.com/pypa/get-pip/raw/c20b0cfd643cd4a19246ccf204e2997af70f6b21/public/get-pip.py PYTHON_GET_PIP_SHA256=fa6f3fb93cce234cd4e8dd2beb54a51ab9c247653b52855a48dd44e6b21ff28b
This is the most direct key management method. Although it is not the most secure, it will keep honest people honest, because it provides a thin protective layer to help keep the key from being found by curious wandering eyes.
Using shared volumes to pass keys is a better solution, but they should be encrypted through Vault or AWS key management service (KMS) because they are saved to disk.
Build time parameters
You can use build time parameters to pass keys at build time, but these keys are visible to those who can access the image through docker history.
example
FROM python:3.9-slim ARG DATABASE_PASSWORD
structure
docker build --build-arg "DATABASE_PASSWORD=SuperSecretSauce" .
If you only need to temporarily use the key as part of the build. For example, the SSH key used to clone a private repo or download a private package. You should use multi-stage construction because the builder's history will be ignored by the temporary stage.
# Interim phase FROM python:3.9-slim as builder # Key parameters arg ssh_private_key # Install git RUN apt-get update && (function apt-get update). apt-get install -y --no-install-recommends git # Clone repo Using ssh key RUN mkdir -p /root/.ssh/ && \\ echo "${PRIVATE_SSH_KEY}" > /root/.ssh/id_rsa RUN touch /root/.ssh/known_hosts & & ssh-keyscan bitbucket.org >> /root/.ssh/known_hosts RUN git clone git@github.com:testdrivenio/not-real.git # final phase FROM python:3.9-slim working directory /app # Copy a version library from a temporary image COPY --from=builder /your-repo /app/your-repo
Multistage builds retain only the history of the final mirror. You can use this function for permanent keys required by your application, such as database credentials.
You can also use the new -- secret option in docker build to pass keys to the Docker image. These keys will not be stored in the image.
# "docker_is_awesome" > secrets.txt FROM alpine # Displays the key from the default key location. RUN --mount=type=secret,id=mysecret cat /run/secrets/mysecret
This will load secrets Txt file.
Build mirror
docker build --no-cache --progress=plain --secret id=mysecret,src=secrets.txt . # output ... #4 [1/2] FROM docker.io/library/alpine #4 sha256:665ba8b2cdc0cb0200e2a42a6b3c0f8f684089f4cd1b81494fbb9805879120f7 #4 cached #5 [2/2] RUN --mount=type=secret,id=mysecret cat /run/secrets/myecret #5 sha256:75601a522ebe80ada66dedd9dd86772ca932d30d7e1b11bba94c04aa55c237de #5 0.635 docker_is_awesome#5 DONE 0.7s #6 export to image
Finally, check the history to see if the key has been compromised.
❯ docker history 49574a19241c IMAGE CREATED CREATED BY SIZE COMMENT 49574a19241c 5 minutes ago CMD ["/bin/sh"] 0B buildkit.dockerfile.v0 <missing> 5 minutes ago RUN /bin/sh -c cat /run/secrets/mysecret # b... 0B buildkit.dockerfile.v0 <missing> 4 weeks ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B <missing> 4 weeks ago /bin/sh -c #(nop) ADD file:aad4290d27580cc1a... 5.6MB
Docker key
If you are using Docker Swarm, you can use Docker secrets to manage keys.
For example, start Docker Swarm mode.
docker swarm init
Create a docker key.
echo "supersecretpassword" | docker secret create postgres_password - qdqmbpizeef0lfhyttxqfbty0 docker secret ls ID NAME DRIVER CREATED UPDATED qdqmbpizeef0lfhyttxqfbty0 postgres_password 4 seconds ago 4 seconds ago
When a container is given access to the above key, it will be mounted in / run/secrets/postgres_password. This file will contain the actual value of the plaintext key.
Use other orchestration tools?
- Use AWS Secrets Manager key and Kubernetes key [7]
- DigitalOcean Kubernetes - recommended steps for protecting DigitalOcean Kubernetes clusters [8]
- Google Kubernetes engine - use key manager with other products [9]
- Nomad - Vault integration and retrieval of dynamic keys [10]
3. Use Docker ignore file
It has been mentioned several times before Dockeignore file. This file is used to specify the files and folders you do not want to be added to the initial build context sent to the Docker daemon, which will build your image. In other words, you can use it to define the build environment you need.
When a Docker image is built, the entire Docker context - that is, the root of your project is sent to the Docker daemon before the COPY or ADD command is executed.
This can be quite resource intensive, especially if your project has many dependencies, a large number of data files, or build artifacts.
In addition, when Docker CLI and daemon are not on the same machine. For example, if the daemon is executed on a remote machine, you should pay more attention to the size of the build environment.
You should be here What is added to the docker ignore file?
- Temporary files and folders
- Build log
- Local secrets
- Local development files, such as docker compose yml
- Version control folders, such as ". git", "hg" and ". vscode"
example:
**/.git **/.gitignore **/.vscode **/coverage **/.env **/.aws **/.ssh Dockerfile README.md docker-compose.yml **/.DS_Store **/venv **/env
In short, the structure is reasonable Docker ignore can help
- Reduce the size of Docker image
- Accelerate the build process
- Prevent unnecessary cache invalidation
- Prevent disclosure
4. Check and scan your Dockerfile and image
Linting is the process of checking the source code for programming and style errors and bad practices that may lead to potential defects. Like programming languages, static files can also be lint. Especially for your dockerfiles, linter can help ensure their maintainability, avoid deprecating syntax, and follow best practices. Grooming images should be a standard part of the CI pipeline.
Hadolint[11] is the most popular dockerfile linker:
hadolint Dockerfile Dockerfile:1 DL3006 warning: Always tag the version of an image explicitly Dockerfile:7 DL3042 warning: Avoid the use of cache directory with pip. Use `pip install --no-cache-dir <package>` Dockerfile:9 DL3059 info: Multiple consecutive `RUN` instructions. Consider consolidation. Dockerfile:17 DL3025 warning: Use arguments JSON notation for CMD and ENTRYPOINT arguments
This is an online link to Hadolint https://hadolint.github.io/hadolint/ You can also install the VS Code plug-in [12]
You can use Dockerfile in conjunction with scanning images and container vulnerabilities.
Here are some influential image scanning tools:
- Snyk[13] is the exclusive provider of Docker local vulnerability scanning. You can scan images using the docker scan CLI command.
- Trivy[14] can be used to scan container images, file systems, git repositories, and other configuration files.
- Clair[15] is an open source project for static analysis of vulnerabilities in application containers.
- Anchor [16] is an open source project that provides centralized services for container image inspection, analysis and authentication.
In short, lint and scan your Dockerfile and images to find any potential problems that deviate from best practices.
5. Signing and verifying images
How do you know that the image used to run production code has not been tampered with?
Tampering can be achieved by man in the middle (MITM) attack or complete destruction of the registry. Docker content trust (DCT) can sign and verify docker images from remote registries.
To verify the integrity and authenticity of the image, set the following environment variables.
DOCKER_CONTENT_TRUST=1
Now, if you try to pull an unsigned image, you will receive the following error.
Error: remote trust data does not exist for docker.io/namespace/unsigned-image: notary.docker.io does not have trust data for docker.io/namespace/unsigned-image
You can learn about signing images from signing image documents using Docker content trust.
When downloading images from Docker Hub, make sure to use official images or verified images from trusted sources. Larger teams should use their own internal private container repository
6. Set memory and CPU limits
It is a good idea to limit the memory usage of Docker containers, especially when you run multiple containers on one machine. This prevents any one container from using all available memory, thereby weakening the functionality of other containers.
The easiest way to limit memory usage is to use -- memory and -- cpu options in the Docker cli.
docker run --cpus=2 -m 512m nginx
The above command limits the use of the container to 2 CPU s and 512 megabytes of memory.
You can do the same thing in the Docker Compose file, like this.
version: "3.9" services: redis: image: redis:alpine deploy: resources: limits: cpus: 2 memory: 512M reservations: cpus: 1 memory: 256M
Notice the reservations field. It is used to set the soft limit. When the host has insufficient memory or CPU resources, it will take priority.
Other relevant resources
- Runtime options with memory, CPU, and GPU: https://docs.docker.com/config/containers/resource_constraints/
- Resource limitations of Docker Compose: https://docs.docker.com/compose/compose-file/compose-file-v3/#resources
summary
The above are the 17 best practices introduced in this article. Mastering these best practices will make your Dockerfile and Docker Image compact, clean and secure.
This article is from Docker Best Practices for Python Developers[17].
reference material
[1]multi-stage builds: https://docs.docker.com/develop/develop-images/multistage-build/
[2] The best Docker base image for Python applications: https://pythonspeed.com/articles/base-image-python-docker-images/
[3] Using Alpine can make Python Docker build 50 times slower: https://pythonspeed.com/articles/alpine-docker-python/
[4]Docker's official documents: https://docs.docker.com/compose/faq/#why-do-my-services-take-10-seconds-to-recreate-or-stop
[5]HEALTHCHECK: https://docs.docker.com/engine/reference/builder/#healthcheck
[6] Question: https://stackoverflow.com/a/56213290/1799408
[7] Use AWS Secrets Manager key and Kubernetes key: https://docs.aws.amazon.com/eks/latest/userguide/manage-secrets.html
[8] Recommended steps for protecting digital ocean kubernetes cluster: https://www.digitalocean.com/community/tutorials/recommended-steps-to-secure-a-digitalocean-kubernetes-cluster
[9] Use key manager with other products: https://cloud.google.com/secret-manager/docs/using-other-products#google-kubernetes-engine
[10]Vault integration and retrieval of dynamic keys: https://learn.hashicorp.com/tutorials/nomad/vault-postgres?in=nomad/integrate-vault
[11]Hadolint: https://github.com/hadolint/hadolint
[12] Plug in: https://marketplace.visualstudio.com/items?itemName=exiasr.hadolint
[13]Snyk: https://docs.docker.com/engine/scan/
[14]Trivy: https://aquasecurity.github.io/trivy/
[15]Clair: https://github.com/quay/clair
[16]Anchore: https://github.com/anchore/anchore-engine
[17]Docker Best Practices for Python Developers: https://testdriven.io/blog/docker-best-practices/