Have you learned to build your own kafka mirror for development testing?

Posted by jcanker on Sun, 10 Oct 2021 18:14:44 +0200

Preface

Functional debugging is often done in collaboration with some software during development, for example, when working with Flink CDC, you need to import data from mysql binlog into kafka and then into hudi data lake.

So the problem is here. To do this, I need to start with a mysql, a kafka, a yarn cluster, a hdfs cluster so that the whole environment runs, so I can use Flink for testing and validation. Of course, if you have a resident service running in the above environment, these are not problems.

But it might be tricky to say that we only have one development host of our own. To accomplish the above tasks, we may first need to install the virtual machine on the host, then install the above environment on the virtual machine. When we need to test, start the virtual machine every time, then enter the virtual machine and start all kinds of environments. Of course, this is a viable method.However, it may not be the most effective way to build a development environment based on mirroring.

Build a kafka mirror to run the kafka environment in docker

The following is an example of how to construct a kafka image.

Before building this image, we should have an idea that the entire project should be fully flexible when writing Dockerfile, that is, at least we can fully support the upgrade of kafka with very few modifications.

The following code is visible: https://github.com/xiaozhch5/dockerfile.git

Based on these principles, we can write as follows:

  • The version numbers of zookeeper and kafka are abstracted as build parameters and specified at build time

  • Write profile information in advance

  • Expose ports 2181 and 9092

Its Dockerfile is:

FROM centos:centos7.9.2009

WORKDIR /data

ARG ZK_VERSION=3.7.0
ARG KAFKA_SCALA_VERSION=2.12
ARG KAFKA_VERSION=2.8.1

COPY start-kafka.sh /data
COPY zoo.cfg /data
COPY server.properties /data

EXPOSE 2181 9092

RUN yum update -y
RUN yum install wget java-1.8.0-openjdk-devel java-1.8.0-openjdk -y

RUN wget https://mirrors.tuna.tsinghua.edu.cn/apache/kafka/${KAFKA_VERSION}/kafka_${KAFKA_SCALA_VERSION}-${KAFKA_VERSION}.tgz
RUN wget https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-${ZK_VERSION}/apache-zookeeper-${ZK_VERSION}-bin.tar.gz

RUN tar zxvf kafka_${KAFKA_SCALA_VERSION}-${KAFKA_VERSION}.tgz
RUN tar zxvf apache-zookeeper-${ZK_VERSION}-bin.tar.gz

RUN ln -s kafka_${KAFKA_SCALA_VERSION}-${KAFKA_VERSION} kafka
RUN ln -s apache-zookeeper-${ZK_VERSION}-bin zookeeper

RUN cp /data/zoo.cfg /data/zookeeper/conf
RUN cp /data/server.properties /data/kafka/config

RUN mkdir /data/zookeeper/data
RUN mkdir /data/kafka/kafka-logs

CMD ["bash", "/data/start-kafka.sh"]

In the Dockerfile above,

  • start-kafka.sh is the startup script for the Kafka component

  • zoo.cfg is the configuration file for zookeeper

  • server.properties is the configuration file for kafka

These three files need to be provided and entered into this image beforehand, as follows:

start-kafka.sh

/data/zookeeper/bin/zkServer.sh start

/data/kafka/bin/kafka-server-start.sh -daemon /data/kafka/config/server.properties

tail -F /data/kafka/logs/server.log

zoo.cfg

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/data/zookeeper/data
clientPort=2181

server.properties

broker.id=0
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/data/kafka/kafka-logs
num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=localhost:2181
zookeeper.connection.timeout.ms=18000
group.initial.rebalance.delay.ms=0

Finally, put the above Dockerfile, start-kafka.sh, zoo.cfg, server.properties into the same directory and execute the following commands to build them: (by default, use the zookeeper, Kafka version specified in the Dockerfile)

docker build . --tag xiaozhch5/kafka:2.8.1 --no-cache=true

Once the build is complete, you will see output similar to the following:

C:\bigdata\dockerfile\kafka>docker build . --tag xiaozhch5/kafka:2.8.1 --no-cache=true
[+] Building 98.4s (22/22) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                                           0.0s
 => => transferring dockerfile: 1.06kB                                                                                                                                                                                                                                         0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                                              0.0s
 => => transferring context: 2B                                                                                                                                                                                                                                                0.0s
 => [internal] load metadata for docker.io/library/centos:centos7.9.2009                                                                                                                                                                                                       1.2s
 => [ 1/17] FROM docker.io/library/centos:centos7.9.2009@sha256:9d4bcbbb213dfd745b58be38b13b996ebb5ac315fe75711bd618426a630e0987                                                                                                                                               0.0s
 => [internal] load build context                                                                                                                                                                                                                                              0.0s
 => => transferring context: 100B                                                                                                                                                                                                                                              0.0s
 => CACHED [ 2/17] WORKDIR /data                                                                                                                                                                                                                                               0.0s
 => [ 3/17] COPY start-kafka.sh /data                                                                                                                                                                                                                                          0.0s
 => [ 4/17] COPY zoo.cfg /data                                                                                                                                                                                                                                                 0.0s
 => [ 5/17] COPY server.properties /data                                                                                                                                                                                                                                       0.0s
 => [ 6/17] RUN yum update -y                                                                                                                                                                                                                                                 23.1s
 => [ 7/17] RUN yum install wget java-1.8.0-openjdk-devel java-1.8.0-openjdk -y                                                                                                                                                                                               28.8s
 => [ 8/17] RUN wget https://mirrors.tuna.tsinghua.edu.cn/apache/kafka/2.8.1/kafka_2.12-2.8.1.tgz                                                                                                                                                                             16.2s
 => [ 9/17] RUN wget https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.7.0/apache-zookeeper-3.7.0-bin.tar.gz                                                                                                                                                   7.9s
 => [10/17] RUN tar zxvf kafka_2.12-2.8.1.tgz                                                                                                                                                                                                                                  1.3s
 => [11/17] RUN tar zxvf apache-zookeeper-3.7.0-bin.tar.gz                                                                                                                                                                                                                    14.4s
 => [12/17] RUN ln -s kafka_2.12-2.8.1 kafka                                                                                                                                                                                                                                   0.5s
 => [13/17] RUN ln -s apache-zookeeper-3.7.0-bin zookeeper                                                                                                                                                                                                                     0.6s
 => [14/17] RUN cp /data/zoo.cfg /data/zookeeper/conf                                                                                                                                                                                                                          0.6s
 => [15/17] RUN cp /data/server.properties /data/kafka/config                                                                                                                                                                                                                  0.7s
 => [16/17] RUN mkdir /data/zookeeper/data                                                                                                                                                                                                                                     0.6s
 => [17/17] RUN mkdir /data/kafka/kafka-logs                                                                                                                                                                                                                                   0.6s
 => exporting to image                                                                                                                                                                                                                                                         1.6s
 => => exporting layers                                                                                                                                                                                                                                                        1.6s
 => => writing image sha256:07acb689dbae2445117a823003fbde7df67e3edd5c26c11acaaaa6409a7ea700                                                                                                                                                                                   0.0s
 => => naming to docker.io/xiaozhch5/kafka:2.8.1    

Once you've built the mirror you need, you can use it to run your kafka environment for development testing.

Start the container based on the above kafka image:

docker run -itd -p 2181:2181 -p 9092:9092 xiaozhch5/kafka:2.8.1

At this point, we can develop and test based on the above kafka environment.

summary

When we develop, we actually encounter a variety of environments (different components, different versions, etc.). Running our own built mirrors based on docker startup containers is an effective way.

If you have a better way, please leave a message below!

Learn more

For more downloads of big data, data lakes, free resources, please refer to:

https://lrting.top

Topics: Big Data Docker kafka