Building Flink cluster with Docker

Posted by wchris on Thu, 27 Feb 2020 06:01:27 +0100

Article directory

Preface

Recently, I contacted docker and wanted to play in a Flink cluster. I searched the Internet and found that most of them were built from Dockerfile. Then I looked at the official website and found that there were tutorials built using docker. I refer to the way of the official website to build in the Linux environment and record the pits I stepped on by the way.
There are mainly two ways to build, one is to use the docker command to build and the other is to use the docker compose to build. Before the operation, I have installed the docker, installed the docker and changed the domestic source of the docker. This is not covered here. You can search on the Internet

Method 1: build with docker command

  • Create network
docker network create app-tier --bridge
  • Create a jobmanager container
docker run -t -d --name jmr \
--network app-tier \
-e JOB_MANAGER_RPC_ADDRESS=jmr \
-p 8081:8081  \
flink:1.9.2-scala_2.12 jobmanager
  • Create taskmanager container
docker run -t -d --name tmr \
--network app-tier \
-e JOB_MANAGER_RPC_ADDRESS=jmr   \
flink:1.9.2-scala_2.12 taskmanager

Mode 2: build using docker compose

You must first install docker compose in this way
I used the following command to install, Reference link

su root
curl -L https://get.daocloud.io/docker/compose/releases/download/1.25.4/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose

After installation, we can check the version of docker to verify whether the installation is successful

docker-compose --version

Then we can create a file named docker-compose.yml in a folder. For convenience, I'll create it in the current folder

touch docker-compose.yml

Then edit this file, write the following and save to exit!!! Pay attention to the format

version: "2.1"
services:
  jobmanager:
    image: flink:1.9.2-scala_2.12
    expose:
      - "6123"
    ports:
      - "8081:8081"
    command: jobmanager
    environment:
      - JOB_MANAGER_RPC_ADDRESS=jobmanager

  taskmanager:
    image: flink:1.9.2-scala_2.12
    expose:
      - "6121"
      - "6122"
    depends_on:
      - jobmanager
    command: taskmanager
    links:
      - "jobmanager:jobmanager"
    environment:
      - JOB_MANAGER_RPC_ADDRESS=jobmanager

Finally, it can be built by the following command

docker-compose up -d

How to view Flink clusters and logs

View cluster through web

The official image exposes port 8081. In mode 1 and mode 2, we have mapped the local port 8081 of the machine to port 8081 of the container, so we can view the cluster through the browser, because I am running on the virtual machine. Therefore, if the docker is installed directly on the machine and not on the virtual machine, it can be accessed directly through localhost:8081


View log command

#tmr is the image command, which can be replaced with other image names
docker logs tmr

Answering questions and dispel doubts

The first way is on the official website

Running a JobManager or a TaskManager
You can run a JobManager (master).

$ docker run --name flink_jobmanager -d -t flink jobmanager

You can also run a TaskManager (worker). Notice that workers need to register with the JobManager directly or via ZooKeeper so the master starts to send them tasks to execute.

$ docker run --name flink_taskmanager -d -t flink taskmanager

There are pits in it. You can build a single container. If you build multiple containers, you will report an error
Method 2: docker-compose.yml is copied from the official website

  1. Why to create a network in mode 1
    A: the purpose of creating a network is for the container to discover and communicate with each other through the container name or container ID. If the created network is not specified, the following errors may occur to taskmanager
ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - TaskManager initialization failed.
java.net.UnknownHostException: jmr: Name or service not known
        at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
  1. Why the environment variable job manager RPC address should be specified in mode 1
    Answer: if the environment variable is not specified, jobmanager (i.e. jmr) will not report an error, but taskmanager (i.e. tmr) will report the following error
2020-02-27 03:35:12,029 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@f59c179f3470:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@f59c179f3470:6123/user/resourcemanager..
2020-02-27 03:35:20,782 ERROR org.apache.flink.runtime.taskexecutor.TaskExecutor            - Fatal error occurred in TaskExecutor akka.tcp://flink@172.18.0.9:38671/user/taskmanager_0.
org.apache.flink.runtime.taskexecutor.exceptions.RegistrationTimeoutException: Could not register at the ResourceManager within the specified maximum registration duration 300000 ms. This indicates a problem with this instance. Terminating now.
        at org.apache.flink.runtime.taskexecutor.TaskExecutor.registrationTimeout(TaskExecutor.java:1111)
        at org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$startRegistrationTimeout$8(TaskExecutor.java:1097)
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)

taskmanager failed to register with jobmanager within 30000 MS, throwing an exception
The reason for this error is that when docker starts the flink container, it will execute the official customized image docker-entrypoint.sh Script file.
It is specified in the script file that if the environment variable job? Manager? RPC? Address is not specified in the startup command, the local hostname will be used as the value of the environment variable and written to the configuration file. In docker, the value is Container ID, so you can see similar information in the log

INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.address, f59c179f3470

Therefore, if we do not specify the value of this environment variable, then taskmanager will register with its 6123 port, which is wrong, because taskmanger does not have the ability to accept registration, and taskmanger should register with jobmanager. Therefore, we need to specify the value of job manager RPC address as the container name or ID of jobmanager or the IP of the container

  1. How to add or delete containers to this cluster
    A: to delete the container, we can simply implement it with rm command
docker rm -f tmr

You can use the docker command to add a container. Just change the container name

docker run -t -d --name tmr1 \
--network app-tier \
-e JOB_MANAGER_RPC_ADDRESS=jmr   \
flink:1.9.2-scala_2.12 taskmanager

You can also add it by modifying the docker-compose.yaml file, copy the taskmanger configuration, modify the container name (I haven't tried this, it should be OK), start it with the command, and the - d parameter in both docker and docker compose refers to start in the background, so that you won't print a lot of logs. You can use docker logs [container name] command view log

Docker compose up - D [container name]
  1. Flink: what is 1.9.2-scala Gu 2.12?
    Answer: the format of this string is Image name:tag. In tag, 1.9.2 refers to the version of flink, and scala Gu 2.12 refers to the version of flink built with scala 2.12. If you want to use other images, you can use docker search [container name, such as flink] to search, and then use the docker pull [container name] command to pull the image
29 original articles published, 35 praised, 30000 visitors+
Private letter follow

Topics: Docker network Apache Java