broker expansion of kafka operation and maintenance

Posted by nmreddy on Fri, 26 Jul 2019 15:28:49 +0200

Background:

Recently, the company's three-node kafka cluster found that there are two nodes in the cutter box switch failure risk, will randomly appear port up/down situation. Therefore, it is necessary to temporarily migrate these two broker s, and then migrate back after the switch is repaired.


Here is the whole process of the simulation (expansion + contraction)

The original three-node kafka assumption is node1, node2, node3

Prepare two free-point servers (assuming node4 and node5 here)


System version: CentOS 7


node1  192.168.2.187

node2  192.168.2.188

node3  192.168.2.189

node4  192.168.2.190

node5  192.168.2.191


The expansion operation of kafka is divided into two steps:

1. zk node expansion

2. kafka Node Expansion


Firstly, the related software is deployed on node4 node5:

cd /root/
tar xf zookeeper-3.4.9.tar.gz
tar xf kafka_2.11-0.10.1.0.tar.gz 
tar xf jdk1.8.0_101.tar.gz 

mv kafka_2.11-0.10.1.0  zookeeper-3.4.9   jdk1.8.0_101   /usr/local/

cd /usr/local/ 
ln -s zookeeper-3.4.9   zookeeper-default
ln -s kafka_2.11-0.10.1.0  kafka-default
ln -s jdk1.8.0_101    jdk-default


Part I: Expansion of zk nodes:

1. Execute on node4:

mkdir /usr/local/zookeeper-default/data/ 

vim  /usr/local/zookeeper-default/conf/zoo.cfg  On the original basis, add the last two lines of configuration code:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/local/zookeeper-default/data/
clientPort=2181
maxClientCnxns=2000
maxSessionTimeout=240000
server.1=192.168.2.187:2888:3888
server.2=192.168.2.188:2888:3888
server.3=192.168.2.189:2888:3888
server.4=192.168.2.190:2888:3888
server.5=192.168.2.191:2888:3888

##Clear directories to prevent dirty data
rm -fr /usr/local/zookeeper-default/data/*

##Add the corresponding myid file to the zk data directory
echo 4 > /usr/local/zookeeper-default/data/myid


2. Start the zk process of node4:

/usr/local/zookeeper-default/bin/zkServer.sh start

/usr/local/zookeeper-default/bin/zkServer.sh  status   Similar to the following effects:
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper-default/bin/../conf/zoo.cfg
Mode: follower

/usr/local/zookeeper-default/bin/zkCli.sh

echo stat | nc 127.0.0.1 2181  The results are similar to the following:
Zookeeper version: 3.4.9-1757313, built on 08/23/2016 06:50 GMT
Clients:
 /127.0.0.1:50072[1](queued=0,recved=6,sent=6)
 /127.0.0.1:50076[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/2/13
Received: 24
Sent: 23
Connections: 2
Outstanding: 0
Zxid: 0x10000009a
Mode: follower
Node count: 63


3. Execute on node5:

vim  /usr/local/zookeeper-default/conf/zoo.cfg  Add the last two lines of code:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/local/zookeeper-default/data/
clientPort=2181
maxClientCnxns=2000
maxSessionTimeout=240000
server.1=192.168.2.187:2888:3888
server.2=192.168.2.188:2888:3888
server.3=192.168.2.189:2888:3888
server.4=192.168.2.190:2888:3888
server.5=192.168.2.191:2888:3888

##Clear directories to prevent dirty data
rm -fr /usr/local/zookeeper-default/data/*

##Add the corresponding myid file to the zk data directory
echo 5 > /usr/local/zookeeper-default/data/myid



4. Start the zk process of node5:

/usr/local/zookeeper-default/bin/zkServer.sh start

/usr/local/zookeeper-default/bin/zkServer.sh  status
 
echo stat | nc  127.0.0.1 2181  The results are similar to the following:
Zookeeper version: 3.4.9-1757313, built on 08/23/2016 06:50 GMT
Clients:
 /127.0.0.1:45582[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/0
Received: 3
Sent: 2
Connections: 1
Outstanding: 0
Zxid: 0x10000009a
Mode: follower
Node count: 63
//echo mntr | nc | 127.0.0.1 2181 can also be used for more detailed results, similar to the following:
zk_version3.4.9-1757313, built on 08/23/2016 06:50 GMT
zk_avg_latency0
zk_max_latency194
zk_min_latency0
zk_packets_received101436
zk_packets_sent102624
zk_num_alive_connections4
zk_outstanding_requests0
zk_server_statefollower
zk_znode_count141
zk_watch_count190
zk_ephemerals_count7
zk_approximate_data_size10382
zk_open_file_descriptor_count35
zk_max_file_descriptor_count102400


5. When we confirm that the two new zk nodes are all right, we need to modify the previous configuration of the three old zks, and then restart the three zks.

Modify the zk configuration of node1 node 2 node 3 as follows:

vim  /usr/local/zookeeper-default/conf/zoo.cfg  Add the last two lines of code:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/local/zookeeper-default/data/
clientPort=2181
maxClientCnxns=2000
maxSessionTimeout=240000
server.1=192.168.2.187:2888:3888
server.2=192.168.2.188:2888:3888
server.3=192.168.2.189:2888:3888
server.4=192.168.2.190:2888:3888
server.5=192.168.2.191:2888:3888



Note that when restarting, we restart the follower node first (for example, here follower is node2, node3, leader is node1)

/usr/local/zookeeper-default/bin/zkServer.sh stop
/usr/local/zookeeper-default/bin/zkServer.sh status

/usr/local/zookeeper-default/bin/zkServer.sh start
/usr/local/zookeeper-default/bin/zkServer.sh status



Part II: Expansion of kafka nodes:


1. Modify node4 (192.168.2.190):

mkdir -pv /usr/local/kafka-default/kafka-logs

vim /usr/local/kafka-default/config/server.properties  The revised document is as follows:
broker.id=4   #Attention should be paid to modifying here
listeners=PLAINTEXT://:9094,TRACE://:9194
advertised.listeners=PLAINTEXT://192.168.2.190:9094
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/usr/local/kafka-default/kafka-logs
num.partitions=3
num.recovery.threads.per.data.dir=1
log.retention.hours=24
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=192.168.2.187:2181,192.168.2.188:2181,192.168.2.189:2181,192.168.2.190:2181,192.168.2.191:2181  #Attention should be paid to modifying here
zookeeper.connection.timeout.ms=6000
default.replication.factor=2
compression.type=gzip
offsets.retention.minutes=2880
controlled.shutdown.enable=true
delete.topic.enable=true


2. Start the kafka program of node4:


/usr/local/kafka-default/bin/kafka-server-start.sh -daemon /usr/local/kafka-default/config/server.properties


3. Modification on node5(192.168.2.191)

mkdir -pv /usr/local/kafka-default/kafka-logs

vim /usr/local/kafka-default/config/server.properties  The revised document is as follows:
broker.id=5   #Attention should be paid to modifying here
listeners=PLAINTEXT://:9094,TRACE://:9194
advertised.listeners=PLAINTEXT://192.168.2.191:9094
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/usr/local/kafka-default/kafka-logs
num.partitions=3
num.recovery.threads.per.data.dir=1
log.retention.hours=24
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=192.168.2.187:2181,192.168.2.188:2181,192.168.2.189:2181,192.168.2.190:2181,192.168.2.191:2181   #Attention should be paid to modifying here
zookeeper.connection.timeout.ms=6000
default.replication.factor=2
compression.type=gzip
offsets.retention.minutes=2880
controlled.shutdown.enable=true
delete.topic.enable=true


4. Start the kafka program of node5:

/usr/local/kafka-default/bin/kafka-server-start.sh -daemon /usr/local/kafka-default/config/server.properties


5. Is there a problem with the test?

Here we can use kafka-console-producer.sh and kafka-console-consumer.sh to test whether they are working properly and then see if there are any copies on kafka-manager that need to be rebalanced.



Part 3: Data migration for risky broker nodes (I need to do this here, and simple expansion does not require this step):

Here we can use the web platform kafka-manager to migrate topic. It's very simple. There are no screenshots.


Part 4: Offline operation of node2 node3

1. Close the zk process on the node 2 node 3 and allow the zk leader node to elect automatically

2. Close the kafka process on Noe2 Noe3 and allow the kafka controller node to elect automatically




## Possible problems:

In the process of migration, we encounter consumergroup exception when we migrate topic, which makes the business side restart consumer and then the error disappears.





Topics: Linux Zookeeper kafka socket vim