Background:
Recently, the company's three-node kafka cluster found that there are two nodes in the cutter box switch failure risk, will randomly appear port up/down situation. Therefore, it is necessary to temporarily migrate these two broker s, and then migrate back after the switch is repaired.
Here is the whole process of the simulation (expansion + contraction)
The original three-node kafka assumption is node1, node2, node3
Prepare two free-point servers (assuming node4 and node5 here)
System version: CentOS 7
node1 192.168.2.187
node2 192.168.2.188
node3 192.168.2.189
node4 192.168.2.190
node5 192.168.2.191
The expansion operation of kafka is divided into two steps:
1. zk node expansion
2. kafka Node Expansion
Firstly, the related software is deployed on node4 node5:
cd /root/ tar xf zookeeper-3.4.9.tar.gz tar xf kafka_2.11-0.10.1.0.tar.gz tar xf jdk1.8.0_101.tar.gz mv kafka_2.11-0.10.1.0 zookeeper-3.4.9 jdk1.8.0_101 /usr/local/ cd /usr/local/ ln -s zookeeper-3.4.9 zookeeper-default ln -s kafka_2.11-0.10.1.0 kafka-default ln -s jdk1.8.0_101 jdk-default
Part I: Expansion of zk nodes:
1. Execute on node4:
mkdir /usr/local/zookeeper-default/data/ vim /usr/local/zookeeper-default/conf/zoo.cfg On the original basis, add the last two lines of configuration code: tickTime=2000 initLimit=10 syncLimit=5 dataDir=/usr/local/zookeeper-default/data/ clientPort=2181 maxClientCnxns=2000 maxSessionTimeout=240000 server.1=192.168.2.187:2888:3888 server.2=192.168.2.188:2888:3888 server.3=192.168.2.189:2888:3888 server.4=192.168.2.190:2888:3888 server.5=192.168.2.191:2888:3888 ##Clear directories to prevent dirty data rm -fr /usr/local/zookeeper-default/data/* ##Add the corresponding myid file to the zk data directory echo 4 > /usr/local/zookeeper-default/data/myid
2. Start the zk process of node4:
/usr/local/zookeeper-default/bin/zkServer.sh start /usr/local/zookeeper-default/bin/zkServer.sh status Similar to the following effects: ZooKeeper JMX enabled by default Using config: /usr/local/zookeeper-default/bin/../conf/zoo.cfg Mode: follower /usr/local/zookeeper-default/bin/zkCli.sh echo stat | nc 127.0.0.1 2181 The results are similar to the following: Zookeeper version: 3.4.9-1757313, built on 08/23/2016 06:50 GMT Clients: /127.0.0.1:50072[1](queued=0,recved=6,sent=6) /127.0.0.1:50076[0](queued=0,recved=1,sent=0) Latency min/avg/max: 0/2/13 Received: 24 Sent: 23 Connections: 2 Outstanding: 0 Zxid: 0x10000009a Mode: follower Node count: 63
3. Execute on node5:
vim /usr/local/zookeeper-default/conf/zoo.cfg Add the last two lines of code: tickTime=2000 initLimit=10 syncLimit=5 dataDir=/usr/local/zookeeper-default/data/ clientPort=2181 maxClientCnxns=2000 maxSessionTimeout=240000 server.1=192.168.2.187:2888:3888 server.2=192.168.2.188:2888:3888 server.3=192.168.2.189:2888:3888 server.4=192.168.2.190:2888:3888 server.5=192.168.2.191:2888:3888 ##Clear directories to prevent dirty data rm -fr /usr/local/zookeeper-default/data/* ##Add the corresponding myid file to the zk data directory echo 5 > /usr/local/zookeeper-default/data/myid
4. Start the zk process of node5:
/usr/local/zookeeper-default/bin/zkServer.sh start /usr/local/zookeeper-default/bin/zkServer.sh status echo stat | nc 127.0.0.1 2181 The results are similar to the following: Zookeeper version: 3.4.9-1757313, built on 08/23/2016 06:50 GMT Clients: /127.0.0.1:45582[0](queued=0,recved=1,sent=0) Latency min/avg/max: 0/0/0 Received: 3 Sent: 2 Connections: 1 Outstanding: 0 Zxid: 0x10000009a Mode: follower Node count: 63 //echo mntr | nc | 127.0.0.1 2181 can also be used for more detailed results, similar to the following: zk_version3.4.9-1757313, built on 08/23/2016 06:50 GMT zk_avg_latency0 zk_max_latency194 zk_min_latency0 zk_packets_received101436 zk_packets_sent102624 zk_num_alive_connections4 zk_outstanding_requests0 zk_server_statefollower zk_znode_count141 zk_watch_count190 zk_ephemerals_count7 zk_approximate_data_size10382 zk_open_file_descriptor_count35 zk_max_file_descriptor_count102400
5. When we confirm that the two new zk nodes are all right, we need to modify the previous configuration of the three old zks, and then restart the three zks.
Modify the zk configuration of node1 node 2 node 3 as follows:
vim /usr/local/zookeeper-default/conf/zoo.cfg Add the last two lines of code: tickTime=2000 initLimit=10 syncLimit=5 dataDir=/usr/local/zookeeper-default/data/ clientPort=2181 maxClientCnxns=2000 maxSessionTimeout=240000 server.1=192.168.2.187:2888:3888 server.2=192.168.2.188:2888:3888 server.3=192.168.2.189:2888:3888 server.4=192.168.2.190:2888:3888 server.5=192.168.2.191:2888:3888
Note that when restarting, we restart the follower node first (for example, here follower is node2, node3, leader is node1)
/usr/local/zookeeper-default/bin/zkServer.sh stop /usr/local/zookeeper-default/bin/zkServer.sh status /usr/local/zookeeper-default/bin/zkServer.sh start /usr/local/zookeeper-default/bin/zkServer.sh status
Part II: Expansion of kafka nodes:
1. Modify node4 (192.168.2.190):
mkdir -pv /usr/local/kafka-default/kafka-logs vim /usr/local/kafka-default/config/server.properties The revised document is as follows: broker.id=4 #Attention should be paid to modifying here listeners=PLAINTEXT://:9094,TRACE://:9194 advertised.listeners=PLAINTEXT://192.168.2.190:9094 num.network.threads=3 num.io.threads=8 socket.send.buffer.bytes=102400 socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 log.dirs=/usr/local/kafka-default/kafka-logs num.partitions=3 num.recovery.threads.per.data.dir=1 log.retention.hours=24 log.segment.bytes=1073741824 log.retention.check.interval.ms=300000 zookeeper.connect=192.168.2.187:2181,192.168.2.188:2181,192.168.2.189:2181,192.168.2.190:2181,192.168.2.191:2181 #Attention should be paid to modifying here zookeeper.connection.timeout.ms=6000 default.replication.factor=2 compression.type=gzip offsets.retention.minutes=2880 controlled.shutdown.enable=true delete.topic.enable=true
2. Start the kafka program of node4:
/usr/local/kafka-default/bin/kafka-server-start.sh -daemon /usr/local/kafka-default/config/server.properties
3. Modification on node5(192.168.2.191)
mkdir -pv /usr/local/kafka-default/kafka-logs vim /usr/local/kafka-default/config/server.properties The revised document is as follows: broker.id=5 #Attention should be paid to modifying here listeners=PLAINTEXT://:9094,TRACE://:9194 advertised.listeners=PLAINTEXT://192.168.2.191:9094 num.network.threads=3 num.io.threads=8 socket.send.buffer.bytes=102400 socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 log.dirs=/usr/local/kafka-default/kafka-logs num.partitions=3 num.recovery.threads.per.data.dir=1 log.retention.hours=24 log.segment.bytes=1073741824 log.retention.check.interval.ms=300000 zookeeper.connect=192.168.2.187:2181,192.168.2.188:2181,192.168.2.189:2181,192.168.2.190:2181,192.168.2.191:2181 #Attention should be paid to modifying here zookeeper.connection.timeout.ms=6000 default.replication.factor=2 compression.type=gzip offsets.retention.minutes=2880 controlled.shutdown.enable=true delete.topic.enable=true
4. Start the kafka program of node5:
/usr/local/kafka-default/bin/kafka-server-start.sh -daemon /usr/local/kafka-default/config/server.properties
5. Is there a problem with the test?
Here we can use kafka-console-producer.sh and kafka-console-consumer.sh to test whether they are working properly and then see if there are any copies on kafka-manager that need to be rebalanced.
Part 3: Data migration for risky broker nodes (I need to do this here, and simple expansion does not require this step):
Here we can use the web platform kafka-manager to migrate topic. It's very simple. There are no screenshots.
Part 4: Offline operation of node2 node3
1. Close the zk process on the node 2 node 3 and allow the zk leader node to elect automatically
2. Close the kafka process on Noe2 Noe3 and allow the kafka controller node to elect automatically
## Possible problems:
In the process of migration, we encounter consumergroup exception when we migrate topic, which makes the business side restart consumer and then the error disappears.