kafka distributed cluster construction

Posted by justAnoob on Sat, 24 Aug 2019 11:49:00 +0200

Version 1

CentOS 7.5

zookeeper-3.4.12

kafka _2.12-1.1.0

II. zookeeper Installation

1. Download and decompress zookeeper compression package

tar -zvxf zookeeper-3.4.12.tar.gz

2. Creating Data and Log Folders

mkdir /usr/local/zookeeper-3.4.12/data
mkdir /usr/local/zookeeper-3.4.12/logs

3. Copy configuration files

Go to the conf directory and copy zoo_sample.cfg

cp zoo_sample.cfg zoo.cfg

4. Enter the data directory and execute commands

echo 1 > myid

Create the myid file and input the value of 1, then perform the same operation on the other two machines. The value of myid is 2,3 successfully configured.

5. Modifying configuration files

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/usr/local/zookeeper-3.4.12/data
dataLogDir=/usr/local/zookeeper-3.4.12/logs
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
#Cluster Server Address
server.1=IP1:2888:3888
server.2=IP2:2888:3888
server.3=IP3:2888:3888

6. Start zookeeper

sh zkServer.sh start

III. Installation of kafka

1. Download and decompress kafka packages

tar -zvxf  kafka_2.12-1.1.0.tgz

2. Modifying configuration files

Open the kafka configuration file

vim server.properties

Modify the configuration

# The server Id，Set to a unique number. The three servers can be set to 1, 2 and 3 respectively.
broker.id=1

#Monitoring address
advertised.listeners=PLAINTEXT://IP address:9092

# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL

#Number of threads in kafka network communication
num.network.threads=3

#Number of threads for kafka IO operations
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600

#Data Storage Path
log.dirs=/tmp/kafka-logs

#Number of default partition s
num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

#In the cluster state, to ensure availability, you need to set it to be greater than 1, where it is set to 3
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=3

############################# Log Flush Policy #############################

#Log retention time
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=IP1:2181,IP2:2181,IP3:2181

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000

############################# Group Coordinator Settings #############################

# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
#group.initial.rebalance.delay.ms=0

# Whether to create themes automatically flase no true yes
auto.create.topics.enable=false

## Allow deletion of theme, default is false
delete.topic.enable=true

3. Start up kafka

bin/kafka-server-start.sh -daemon config/server.properties &

IV. Relevant parameters

broker server configuration

message.max.bytes (default: 1M) - The maximum number of bytes a broker can receive a message, which should be greater than or equal to the producer's max.request.size and less than or equal to the consumer's fetch.message.max.bytes, otherwise the broker will hang because the consumer cannot use the message.
log.segment.bytes (default: 1GB) - The size of the kafka data file ensures that the value is greater than the length of a message. Generally speaking, you can use the default value (usually a message is difficult to exceed 1G, because it is a message system, not a file system).
replica.fetch.max.bytes (default::1MB) - Maximum number of bytes of messages that broker can replicate. This value should be larger than message.max.bytes, otherwise the broker will receive the message, but it will not be able to copy the message, resulting in data loss.

Consumer Consumer Configuration

fetch.message.max.bytes (default 1MB) - The largest message a consumer can read. This value should be greater than or equal to message.max.bytes. So, if you have to choose kafka to send big messages, there are still some things to consider. To deliver big messages, we need to consider the impact of big messages on clusters and topics at the beginning of the design, not after problems arise.

Producer Producer Configuration

buffer.memory (default: 32M) - producer buffer size setting, if the buffer is large enough, producer can always write, but does not mean that the message is actually sent;

Bach. size (default: 16384 byte) - The size of each packet is set so that the packet can be sent when it reaches the specified size. There will be multiple packets in the buffer.

linger.ms - If the packet size has not been up to batch.size, set the maximum waiting time, the message will be sent out;

max.request.size (default: 1M) - The maximum size of data sent by a producer at a time, which is larger than the size of batch.size

Notes

1. To ensure that all partitions are available, offsets.topic.replication.factor is configured at least 3.

2. Turn off the automatic theme creation, and try to ensure that all broker s in the cluster start up, then start client consumption, otherwise it can not guarantee that partition and its replicas are evenly distributed, affecting high availability;

3. After the cluster is started, the distribution of partitions and their replicas can be viewed by command.

bin/kafka-topics.sh --describe --zookeeper localhost:2182 --topic __consumer_offsets

Pay attention to Wechat Public Number and check out more technical articles.

Topics: Java Zookeeper kafka socket SSL

Programmer Think