Deploy Flink application in cloud native mode

Posted by bobby4 on Sun, 27 Feb 2022 06:41:00 +0100

Getting Started Guide

This getting started section will guide you through setting up a fully functional Flink cluster on Kubernetes.

Basic introduction

Kubernetes is a popular container choreography system, which is used to automate the deployment, expansion and management of computer applications. Flink's native kubernetes integration allows you to deploy Flink directly on a running kubernetes cluster. In addition, Flink can dynamically allocate and unassign task manager according to the required resources, because it can talk directly with kubernetes.

prepare

The introduction section assumes that the running Kubernetes cluster meets the following requirements:

  • Kubernetes >= 1.9.
  • KubeConfig, which can list, create and delete pods and services through ~ / kube/config. You can verify permissions by running kubectl auth can-I < list|create|edit|delete > pods.
  • Enable Kubernetes DNS.
  • Default service account with RBAC permission to create and delete pods.

If you haven't already created k8s Cluster, please refer to the article: https://lrting.top/backend/3919/ Quickly build a k8s cluster.

Start on k8s flink session

When starting the flink session on kubernetes, you need two additional jar packages, which need to be placed in the flink/lib Directory:

cd flink/lib
wget https://repo1.maven.org/maven2/org/bouncycastle/bcpkix-jdk15on/1.69/bcpkix-jdk15on-1.69.jar
wget https://repo1.maven.org/maven2/org/bouncycastle/bcprov-jdk15on/1.69/bcprov-jdk15on-1.69.jar

Create flick users and authorizations

kubectl create namespace flink
kubectl create serviceaccount flink
kubectl create clusterrolebinding flink-role-binding-flink \
--clusterrole=edit \
--serviceaccount=default:flink

If you do not create and authorize a Flink user, but use the default user to submit a Flink task, you will get the following error:

Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://10.10.0.1/api/v1/namespaces/default/pods?labelSelector=app%3Dkaibo-test%2Ccomponent%3Dtaskmanager%2Ctype%3Dflink-native-kubernetes. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods is forbidden: User "system:serviceaccount:default:default" cannot list resource "pods" in API group "" in the namespace "default".

Once your Kubernetes cluster is running and kubectl is configured to point to it, you can start a Flink cluster in session mode

# (1) Start Kubernetes session
$ ./bin/kubernetes-session.sh \
 -Dkubernetes.cluster-id=my-first-flink-cluster \
 -Dkubernetes.namespace=flink

# (2) Submit example job
$ ./bin/flink run \
    --target kubernetes-session \
    -Dkubernetes.cluster-id=my-first-flink-cluster \
    -Dkubernetes.namespace=flink \
    ./examples/streaming/TopSpeedWindowing.jar

# (3) Stop Kubernetes session by deleting cluster deployment
$ kubectl delete deployment/my-first-flink-cluster

When using Minikube, you need to call minikube tunnel to expose Flink's LoadBalancer service on Minikube.

After the Flink session is started, the 8081 will be exposed to the local port by default. The following is the output of running the start k8s session command:

[root@rancher02 flink-1.13.5]# ./bin/kubernetes-session.sh \
>  -Dkubernetes.cluster-id=my-first-flink-cluster \
>  -Dkubernetes.namespace=flink
2022-02-26 14:49:16,203 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.address, localhost
2022-02-26 14:49:16,205 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.port, 6123
2022-02-26 14:49:16,205 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.memory.process.size, 1600m
2022-02-26 14:49:16,206 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.memory.process.size, 1728m
2022-02-26 14:49:16,206 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2022-02-26 14:49:16,206 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: parallelism.default, 1
2022-02-26 14:49:16,206 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: s3.endpoint, http://10.0.2.70:9000
2022-02-26 14:49:16,206 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: s3.path.style.access, true
2022-02-26 14:49:16,206 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: s3.access.key, PCGIXWJBM78H74CWUITM
2022-02-26 14:49:16,206 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: s3.secret.key, ******
2022-02-26 14:49:16,206 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: state.backend, rocksdb
2022-02-26 14:49:16,206 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: state.checkpoints.dir, s3://flink/checkpoints
2022-02-26 14:49:16,206 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: state.savepoints.dir, s3://flink/savepoints
2022-02-26 14:49:16,207 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: state.backend.incremental, false
2022-02-26 14:49:16,207 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.execution.failover-strategy, region
2022-02-26 14:49:16,249 INFO  org.apache.flink.client.deployment.DefaultClusterClientServiceLoader [] - Could not load factory due to missing dependencies.
2022-02-26 14:49:17,310 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2022-02-26 14:49:17,320 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (172.800mb (181193935 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2022-02-26 14:49:17,437 INFO  org.apache.flink.kubernetes.utils.KubernetesUtils            [] - Kubernetes deployment requires a fixed port. Configuration blob.server.port will be set to 6124
2022-02-26 14:49:17,437 INFO  org.apache.flink.kubernetes.utils.KubernetesUtils            [] - Kubernetes deployment requires a fixed port. Configuration taskmanager.rpc.port will be set to 6122
2022-02-26 14:49:18,174 INFO  org.apache.flink.kubernetes.KubernetesClusterDescriptor      [] - Create flink session cluster my-first-flink-cluster successfully, JobManager Web Interface: http://10.0.2.78:8081

Check the port 8081 running the flick session. You can see:

Deployment mode

For production use, we recommend deploying Flink applications in application modes because these modes provide better isolation for applications.

Application Mode

Application Mode requires the user code to be bundled with the Flink image because it runs the main() method of the user code on the cluster. Application Mode ensures that all Flink components are properly cleaned up after the application terminates.

Flink community provides a Basic Docker image , which can be used to bundle user codes:

FROM flink
RUN mkdir -p $FLINK_HOME/usrlib
COPY /path/of/my-flink-job.jar $FLINK_HOME/usrlib/my-flink-job.jar

After creating and publishing the Docker image under custom image name, you can start the application cluster with the following command:

$ ./bin/flink run-application \
    --target kubernetes-application \
    -Dkubernetes.cluster-id=my-first-application-cluster \
    -Dkubernetes.container.image=custom-image-name \
    local:///opt/flink/usrlib/my-flink-job.jar

local is the only supported solution in application mode.

kubernetes. The cluster ID option specifies the cluster name and must be unique. If you do not specify this option, Flink generates a random name.

kubernetes. container. The image option specifies the image of the boot pod.

After you deploy an application cluster, you can interact with it:

# List running job on the cluster
$ ./bin/flink list --target kubernetes-application -Dkubernetes.cluster-id=my-first-application-cluster
# Cancel running job
$ ./bin/flink cancel --target kubernetes-application -Dkubernetes.cluster-id=my-first-application-cluster <jobId>

You can override the configuration set in conf / flick-conf.yaml by passing the key value pair - Dkey=value to bin / flick.

Per-Job Cluster Mode

Flink on Kubernetes does not support per job cluster mode.

Session Mode

You have seen the deployment of the Session cluster in the getting started guide at the top of this page.

Session mode can be executed in two modes:

  • Detach mode (default): kubernetes-session.sh Deploy the Flink cluster on Kubernetes and terminate.
  • Attachment mode (- execution. Attached = true): kubernetes-session.sh Stay active and allow commands to be entered to control the running Flink cluster. For example, stop stops a running Session cluster. Type help to list all supported commands.

To reconnect to a running session cluster using the cluster ID my first Flink cluster, use the following command:

$ ./bin/kubernetes-session.sh \
    -Dkubernetes.cluster-id=my-first-flink-cluster \
    -Dexecution.attached=true

You can pass the key value pair - Dkey=value to bin / kubernetes session SH to override the configuration set in conf/flink-conf.yaml.

Stop the running Session cluster

In order to stop running the session cluster with the cluster id of my first Flink cluster, you can delete the Flink deployment or use:

$ echo 'stop' | ./bin/kubernetes-session.sh \
    -Dkubernetes.cluster-id=my-first-flink-cluster \
    -Dexecution.attached=true

Run Flink on k8s for more resources

Configure Flink on Kubernetes

Kubernetes specific configuration options are listed in Configuration page Come on.

Flink use Fabric8 Kubernetes client Communicate with Kubernetes APIServer to create / delete Kubernetes resources (such as Deployment, Pod, ConfigMap, Service, etc.), and observe Pod and ConfigMap. In addition to the above Flink configuration options, there are some features of Fabric8 Kubernetes client Expert options It can be configured through system properties or environment variables.

For example, users can use the following Flink configuration options to set the maximum number of concurrent requests, which allows users to Kubernetes HA Run more jobs in the session cluster when serving. Note that each Flink job consumes 3 concurrent requests.

containerized.master.env.KUBERNETES_MAX_CONCURRENT_REQUESTS: 200
env.java.opts.jobmanager: "-Dkubernetes.max.concurrent.requests=200"

Access Flink's Web UI

Flink's Web UI and REST endpoints can be accessed through kubernetes REST-service. exposed. The type configuration option is exposed in many ways.

ClusterIP: expose services on the internal IP of the cluster. The service can only be accessed within the cluster. If you want to access the JobManager UI or submit a job to an existing session, you need to start the local agent. You can then submit the Flink job to the session or view the dashboard using localhost:8081.

$ kubectl port-forward service/<ServiceName> 8081

NodePort: exposes services on the static port (NodePort) on the IP of each Node Can be used to contact the JobManager service. NodeIP can also be replaced by Kubernetes ApiServer address. You can find its address in your kube configuration file.

LoadBalancer: use the load balancer of the cloud provider to expose services to the outside world. Since the cloud provider and Kubernetes need some time to prepare the load balancer, you may get a NodePort JobManager Web interface in the client log. You can use kubectl get services/-rest to obtain EXTERNAL-IP and manually build the load balancer JobManager Web interface http://:8081.

For more information, please refer to Kubernetes publishing service Official documents.

Depending on your environment, starting a Flink cluster using the LoadBalancer REST service exposure type may make the cluster publicly accessible (usually with the ability to execute arbitrary code).

journal

Kubernetes integrates conf / log4j console Properties and conf / logback console XML is exposed to pod as ConfigMap. Changes to these files will be visible to the newly started cluster.

Access log

By default, JobManager and TaskManager will output logs to the console and / opt/flink/log in each pod at the same time. STDOUT and STDERR output will only be redirected to the console. You can access them in the following ways

$ kubectl logs <pod-name>

If the pod is running, you can also use the kubectl exec - it bash tunnel to enter and view the log or debug the process.

Access the task manager's log

Flink will automatically unassign idle taskmanagers to avoid wasting resources. This behavior makes it more difficult to access the logs of each pod. You can configure resource manager Task manager timeout to increase the time before the idle task manager is released so that you have more time to check the log files.

Dynamically modify log level

If you have configured the logger to Automatically detect configuration changes , you can dynamically adjust the log level by changing the corresponding ConfigMap (assuming that the cluster id is my first Flink cluster):

$ kubectl edit cm flink-config-my-first-flink-cluster

Using plug-ins

In order to use the plug-ins, you must copy them to the correct location in the Flink JobManager/TaskManager pod. You can use Built in plug-in , without installing volumes or building custom Docker images. For example, use the following command to enable the S3 plug-in for your Flink session cluster.

$ ./bin/kubernetes-session.sh
    -Dcontainerized.master.env.ENABLE_BUILT_IN_PLUGINS=flink-s3-fs-hadoop-1.13.5.jar \
    -Dcontainerized.taskmanager.env.ENABLE_BUILT_IN_PLUGINS=flink-s3-fs-hadoop-1.13.5.jar

Custom Docker image

If you want to use a custom Docker image, you can configure the option kubernetes container. Image to specify it. The Flink community offers rich Flink Docker image , can be a good starting point. understand How to customize the Docker image of Flink , learn how to enable plug-ins, add dependencies, and other options.

Use key

Kubernetes Secrets is an object that contains a small amount of sensitive data, such as passwords, tokens, or keys. Such information may be put into a specific pod or image in other ways. Flink on Kubernetes can use Secret in two ways:

  • Use Secrets as the file in pod;
  • Use Secrets as the environment variable;

Use Secrets as the file in pod

The following command will mount the key mysecret under the path / path/to/secret in the started pod:

$ ./bin/kubernetes-session.sh -Dkubernetes.secrets=mysecret:/path/to/secret

The user name and password for the key mysecret can then be found in the files / path/to/secret/username and / path/to/secret/password. For more details, see Kubernetes official documents.

Use Secrets as the environment variable

The following command will expose the key mysecret as an environment variable in the started pod:

$ ./bin/kubernetes-session.sh -Dkubernetes.env.secretKeyRef=\
    env:SECRET_USERNAME,secret:mysecret,key:username;\
    env:SECRET_PASSWORD,secret:mysecret,key:password

Environment variable SECRET_USERNAME contains the user name and the environment variable SECRET_PASSWORD contains the password for the key mysecret. For more details, see Kubernetes official documents.

High availability settings on K8s

Refer to: https://nightlies.apache.org/flink/flink-docs-release-1.13/zh/docs/deployment/ha/overview/

Manual resource cleanup

Flink uses Kubernetes OwnerReference to clean up all cluster components. All resources created by Flink, including ConfigMap, Service and Pod, set OwnerReference to deployment /. When the deployment is deleted, all related resources will be deleted automatically.

$ kubectl delete deployment/<cluster-id>

Supported K8S versions

At present, all k8s versions larger than 1.9 are supported

namespace

The namespace in Kubernetes divides cluster resources among multiple users through resource quotas. Flink on Kubernetes can use namespaces to start Flink clusters. Namespaces can be accessed through Kubernetes Namespace.

RBAC

Role based access control (RBAC) is a method to adjust the access to computing or network resources according to the role of individual users in the enterprise. Users can configure the RBAC role and service account used by JobManager to access the Kubernetes API server in the Kubernetes cluster.

Each namespace has a default service account. However, the default service account may not have permission to create or delete pods in the Kubernetes cluster. Users may need to update the permissions of the default service account or specify another service account bound to the correct role.

$ kubectl create clusterrolebinding flink-role-binding-default --clusterrole=edit --serviceaccount=default:default

If you don't want to use the default service account, you can use the following command to create a new Flink service account and set the role binding. Then use the configuration option - dkubernets Service account = flip service account enables the JobManager pod to create / delete TaskManager pods and leader ConfigMaps using the flip service account service account. This will also allow TaskManager to monitor leader ConfigMaps to retrieve the addresses of JobManager and ResourceManager.

$ kubectl create serviceaccount flink-service-account
$ kubectl create clusterrolebinding flink-role-binding-flink --clusterrole=edit --serviceaccount=default:flink-service-account

For more information, see about RBAC authorization Official Kubernetes documentation.

Pod template

Flink allows users to define JobManager and TaskManager pod through template files. This allows direct support for Flink Kubernetes configuration options Unsupported advanced features. Use kubernetes Pod template file specifies the local file containing the pod definition. It will be used to initialize JobManager and TaskManager. The main container should be defined with the name Flink main container. For more information, see the sample pod template.

Fields overwritten by Flink

Some fields of the pod template will be overwritten by Flink. Mechanisms for resolving valid field values can be divided into the following categories:

  • Flink definition: user cannot configure.
  • User defined: the user is free to specify this value. The Flink framework does not set any additional values. The valid values come from the config option and template.
  • Priority: first use the explicit configuration option value, then the value in the pod template, and finally the default value of the configuration option (if not specified).
  • Merge with Flink: Flink merges the set value with the user-defined value (see "user defined" priority). In the case of a field with the same name, the Flink value has priority.

For a complete list of pod fields to be overwritten, refer to: Pod Template . All fields defined in the pod template that are not listed in the table will not be affected.

Example of Pod Template

pod-template.yaml

apiVersion: v1
kind: Pod
metadata:
  name: jobmanager-pod-template
spec:
  initContainers:
    - name: artifacts-fetcher
      image: artifacts-fetcher:latest
      # Use wget or other tools to get user jars from remote storage
      command: [ 'wget', 'https://path/of/StateMachineExample.jar', '-O', '/flink-artifact/myjob.jar' ]
      volumeMounts:
        - mountPath: /flink-artifact
          name: flink-artifact
  containers:
    # Do not change the main container name
    - name: flink-main-container
      resources:
        requests:
          ephemeral-storage: 2048Mi
        limits:
          ephemeral-storage: 2048Mi
      volumeMounts:
        - mountPath: /opt/flink/volumes/hostpath
          name: flink-volume-hostpath
        - mountPath: /opt/flink/artifacts
          name: flink-artifact
        - mountPath: /opt/flink/log
          name: flink-logs
      # Use sidecar container to push logs to remote storage or do some other debugging things
    - name: sidecar-log-collector
      image: sidecar-log-collector:latest
      command: [ 'command-to-upload', '/remote/path/of/flink-logs/' ]
      volumeMounts:
        - mountPath: /flink-logs
          name: flink-logs
  volumes:
    - name: flink-volume-hostpath
      hostPath:
        path: /tmp
        type: Directory
    - name: flink-artifact
      emptyDir: { }
    - name: flink-logs
      emptyDir: { }

This article is an original article from big data to artificial intelligence blogger "xiaozhch5", which follows the CC 4.0 BY-SA copyright agreement. Please attach the original source link and this statement for reprint.

Original link: https://lrting.top/backend/3922/