Using cache in Gitlab CI DinD to accelerate Docker image (multi-stage) construction process

Posted by cs-web on Thu, 27 Jan 2022 12:05:43 +0100

Articles in CSDN may not be updated in time. Please follow my blog to check the latest version: Xu Sheng's blog

reference resources:
https://andrewlock.net/caching-docker-layers-on-serverless-build-hosts-with-multi-stage-builds—target,-and—cache-from/
https://docs.gitlab.com/ee/ci/docker/using_docker_build.html#make-docker-in-docker-builds-faster-with-docker-layer-caching

background

When using the docker container as the runner in gitlab ci, if you need to call docker in the command, it is equivalent to using docker in the docker container, which is Docker In Docker, or dind for short.

In terms of function, after dind is configured, there is no problem in use except for the shortcomings of security and speed.

There is such a description on the official document:

Cache: Each job runs in a new environment. Concurrent jobs work fine, because every build gets its own instance of Docker engine and they don't conflict with each other. However, jobs can be slower because there's no caching of layers.

The general meaning is that each job runs in a new environment and has its own docker engine instance, so there is no layer cache.

This makes it necessary to pull the basic image in Dockerfile every time we use the docker build command, and we can't use the image layer cache left after the last build task was triggered, because there is no cache file on the host machine at all.

solve

reference resources:
Make Docker-in-Docker builds faster with Docker layer caching

The initial idea is whether the image cache layer can be stored on the host during the construction process. In the process of consulting the document, it is found that docker build provides a - cache from parameter, which can specify an image as the cache source during the construction process.

In this way, the idea is there, because after each construction, although the middle cache layer and the packaged image are not available on the host computer, the final image is uploaded to Harbor in the company.

In this way, before each docker build, you can pull the last built image locally, and then use the -- cache from parameter to specify it as the cache source.

For example, yaml configuration provided in official documents:

image: docker:19.03.12

services:
  - docker:19.03.12-dind

variables:
  # Use TLS https://docs.gitlab.com/ee/ci/docker/using_docker_build.html#tls-enabled
  DOCKER_HOST: tcp://docker:2376
  DOCKER_TLS_CERTDIR: "/certs"

before_script:
  - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY

build:
  stage: build
  script:
    - docker pull $CI_REGISTRY_IMAGE:latest || true
    - docker build --cache-from $CI_REGISTRY_IMAGE:latest --tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA --tag $CI_REGISTRY_IMAGE:latest .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
    - docker push $CI_REGISTRY_IMAGE:latest

Before each build, the nearest image will be pulled down and designated as the image source.

For multi-stage construction

In the case of multi-stage construction, the above method is actually a little problematic, because in the multi-stage construction process, the final generated image only includes the image layer of the last stage, while the image layers of the previous stage are discarded.

In this way, even if the final image is specified as the cache source, it will not work in the pre construction phase.

At this time, the - target parameter of docker build works.

Using the - target parameter, you can specify which phase of the image to build during the construction process, so that we can build the image for a single phase, push it to the harbor, and pull it down as the cache source during the next construction.

Examples are as follows:

build: # Build image
  stage: build
  image: docker:stable
  script:
    # Pull down the builder image and use it as the cache in the first stage of multi-stage construction
    - docker pull $HARBOR:builder || true
    # Carry out the first stage of construction, and output the builder image for the next cache
    - docker build --target builder --cache-from $HARBOR:builder -t $HARBOR:builder -f _ci/Dockerfile .
    # Pull down the latest image and use it as the cache for the construction of the second stage. Judge whether this step needs to be optimized according to the actual situation of Dockerfile
    - docker pull $HARBOR:latest || true
    # Start the actual build
    - docker build --build-arg VERSION_TAG=$CI_COMMIT_TAG --build-arg COMMIT_ID=$CI_COMMIT_SHORT_SHA --cache-from $HARBOR:builder --cache-from $HARBOR:latest -f _ci/Dockerfile -t $HARBOR:$CI_COMMIT_SHORT_SHA .
    - docker push $HARBOR:$CI_COMMIT_SHORT_SHA
    # Upload the builder image generated this time
    - docker push $HARBOR:builder

Topics: Operation & Maintenance Docker ci