Introduction to kafka configuration parameters

Posted by sun14php on Thu, 16 Dec 2021 08:26:05 +0100

Configuration and parameter description

Configuration parameters of consumer

#If 'enable auto. If commit 'is true, the frequency (in milliseconds) that the consumer automatically submits to Kafka is offset, and the default value is 5000.
spring.kafka.consumer.auto-commit-interval = 5000;

#What to do when there is no initial offset in Kafka or the current offset no longer exists on the server? The default value is latest, which means that the offset is automatically reset to the latest offset
#The optional values are latest, early, and none
# Early: when there are submitted offsets under each partition, consumption starts from the submitted offset; When there is no committed offset, consumption starts from scratch
# latest: when there are submitted offsets under each partition, consumption starts from the submitted offset; When there is no committed offset, the newly generated data under the partition is consumed
# none: topic when there are committed offsets in each partition, consumption starts after offset; As long as there is no committed offset in one partition, an exception is thrown
spring.kafka.consumer.auto-offset-reset=latest;

#Comma separated hosts: a list of port pairs used to establish an initial connection to the Kafka cluster.
spring.kafka.consumer.bootstrap-servers = 192.168.240.42:9092,192.168.240.43:9092,192.168.240.44:9092;

#The ID is passed to the server when the request is made; Used for server-side logging.
spring.kafka.consumer.client-id;

#If true, the consumer's offset will be submitted periodically in the background, and the default value is true
spring.kafka.consumer.enable-auto-commit=true;

#If there is not enough data to immediately meet the requirements given in "fetch.min.bytes", the maximum time (in milliseconds) that the server will block before answering the get request
#The default value is 500
spring.kafka.consumer.fetch-max-wait = 500;

#The server shall return the minimum amount of data requested in bytes. The default value is 1, and the parameter of the corresponding kafka is fetch min.bytes.
spring.kafka.consumer.fetch-min-size = 1;

#A unique string identifying the user group to which this user belongs.
spring.kafka.consumer.group-id = allenGroup;

#The expected time (in milliseconds) between the heartbeat and the consumer coordinator. The default value is 3000
spring.kafka.consumer.heartbeat-interval = 3000;

#The key deserializer class implements the interface org apache. kafka. common. serialization. Deserializer
spring.kafka.consumer.key-deserializer=org.apache.kafka.common.serialization.StringDeserializer

#Value, which implements the interface org apache. kafka. common. serialization. Deserializer
spring.kafka.consumer.value-deserializer=org.apache.kafka.common.serialization.StringDeserializer

#The maximum number of records returned when the poll() operation is called at one time. The default value is 500
spring.kafka.consumer.max-poll-records;

Configuration parameters of producer

#procedure requires the leader to receive the confirmation number before considering the completion of the request, which is used to control the persistence of the sending record on the server. Its value can be as follows:
#acks = 0 if set to zero, the producer will not wait for any confirmation from the server, and the record will be immediately added to the socket buffer and treated as sent.
In this case, there is no guarantee that the server has received the record, and the retry configuration will not take effect (because the client usually does not know any failure), and the offset returned for each record is always set to-1.
#acks = 1, which means that the leader will write records to its local log, but can respond without waiting for full confirmation from all replica servers,
In this case, if leader Failed immediately after confirming the record, but the record will be lost before replicating the data to all replica servers.
#acks = all, which means that the leader will wait for the complete synchronous replica set to confirm the records. This ensures that the records will not be lost as long as at least one synchronous replica server is still alive. This is the most powerful guarantee, which is equivalent to the setting of acks = -1.
#The values that can be set are: all, -1, 0, 1
spring.kafka.producer.acks=1

#Whenever multiple records are sent to the same partition, the producer will try to batch process the records together into fewer requests,
#This helps to improve performance on the client and server. This configuration controls the default batch size (in bytes), and the default value is 16384
spring.kafka.producer.batch-size=16384

#Comma separated hosts: a list of port pairs used to establish an initial connection to the Kafka cluster
spring.kafka.producer.bootstrap-servers = 192.168.240.42:9092,192.168.240.43:9092,192.168.240.44:9092;

#The total number of bytes of memory available to the producer to buffer records waiting to be sent to the server. The default value is 33554432
spring.kafka.producer.buffer-memory=33554432

#The ID is passed to the server when the request is made and used for server-side logging
spring.kafka.producer.client-id

#The compression type of all data generated by the producer. This configuration accepts the standard compression codec ('gzip ',' snappy ',' lz4 '),
#It also accepts' uncompressed 'and' producer ', which respectively indicate that there is no compression and the original compression codec set by the producer is retained,
#The default value is producer
spring.kafka.producer.compression-type=producer

#The serializer class of key implements the interface org apache. kafka. common. serialization. Serializer
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer

#The serializer class of value implements the interface org apache. kafka. common. serialization. Serializer
spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer

#If the value is greater than zero, it indicates that the number of failed send attempts is enabled
spring.kafka.producer.retries = 3

Configuration parameters of listener

#Listener AckMode,See https://docs.spring.io/spring-kafka/reference/htmlsingle/#committing-offsets
#When enable auto. When the value of commit is set to false, the value will take effect; It will not take effect when it is true. AckMode values are as follows:
# MANUAL: after each batch of poll() data is processed by the consumer listener, manually call acknowledge. Acknowledge() and submit it
# MANUAL_IMMEDIATE manually calls the acknowledge segment Submit immediately after acknowledge()
# RECORD is submitted after each RECORD is processed by the consumer listener
# BATCH is submitted after each BATCH of poll() data is processed by the listener consumer
# TIME: after each batch of poll() data is processed by the listener consumer, it is submitted when the last submission TIME is greater than TIME
# COUNT submit when the number of processed record s is greater than or equal to COUNT after each batch of poll() data is processed by the listener consumer
# COUNT_ Time time or count is submitted when one of the conditions is met
spring.kafka.listener.ack-mode ;

#The number of threads running in the listener container
spring.kafka.listener.concurrency;

#Timeout (in milliseconds) used when polling consumers
spring.kafka.listener.poll-timeout;

#When ackMode is "COUNT" or "COUNT_TIME", the number of records between offset submissions
spring.kafka.listener.ack-count;

#The TIME (in milliseconds) between offset submissions when ackMode is "TIME" or "COUNT_TIME"
spring.kafka.listener.ack-time;

Important configuration (remove default)

# Cluster address
spring.kafka.bootstrap-servers=172.17.35.141:9092,172.17.41.159:9092,172.17.38.154:9092,172.17.40.60:9092
# Consumer configuration
spring.kafka.consumer.topic=test_topic
spring.kafka.consumer.group-id=streamProcesser
spring.kafka.consumer.auto-offset-reset=earliest
spring.kafka.consumer.key-deserializer=org.apache.kafka.common.serialization.StringDeserializer
spring.kafka.consumer.value-deserializer=org.apache.kafka.common.serialization.StringDeserializer
# Producer configuration
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer

other

session.timeout.ms session timeout

Time out to detect whether the consumer has timed out the fault. The time required for the group coordinator to detect a consumer crash. If a consumer in a consumer group hangs up, it needs a session at most timeout. MS ， seconds.

Mechanism: the consumer} regularly sends heartbeat to prove his survival. If the broker does not receive it within this time, the broker will remove the consumer from the group and conduct a reblancing.

Note: this value needs to be configured in group min.session. timeout. MS ， and ， group max.session. timeout. MS} range.

heartbeat.interval.ms heartbeat interval

When consumers use a group, it is used to ensure the survival of consumers and promote rebrance when consumers join or leave the group. Every consumer ， according to ， heartbeat interval. The time specified by the MS ＾ parameter periodically sends ＾ heartbeat to the group coordinator. The group Coordinator will respond to each consumer. If ＾ rebalance occurs, the response received by each consumer will contain ＾ REBALANCE_IN_PROGRESS ID, so that each consumer knows that rebalancing has occurred, and the progress group coordinator also knows the survival of each consumer.

Note: must be less than session timeout. MS, but usually not greater than session timeout. 1 / 3 of MS, the smaller the rebalancing time is

max.poll.interval.ms maximum pull interval

Detects whether the consumer has timed out or failed to pull If the consumer's two pull times exceed this value, it is considered that the consumer's ability is insufficient. Mark the consumer's commit as failed, remove the consumer from the group, trigger a rebrance, and allocate the partition consumed by the consumer to others.

Note: the higher the value, the longer the reblance time.

Therefore, the purpose of these three parameters is to ensure that there are normal consumers in the group:

1. Judging by heartbeat: consumers are separated by heartbeat interval. MS reports a heartbeat to the broker. The broker calculates how long the consumer hasn't sent a heartbeat to himself. If it exceeds session timeout. MS, then the consumer is considered unavailable and removed.

2. Judge by the pull() interval: if the broker finds max.poll interval. If MS does not call the pull () method, the consumer is removed.

Then some students may ask: if the consumer is not dead according to the heartbeat, but the pull timeout, will it be removed?

Although from 0.10 After 1 session timeout. MS ， and ， max.poll interval. MS {is decoupled. The heartbeat can be sent while processing the message and will not be removed when processing the message. However, when the pull method is called again after processing, it is found that the consumer's two pull times have timed out. It will still retry the failed process, destroy the old thread and get a new thread from the thread pool, so the answer is to remove it.

1,session.timeout.ms must be greater than heartbeat interval. MS, otherwise the consumer group will always be in rebalance state

2,session.timeout.ms is several times better than heartbeat interval. ms； This is because if the coordinator does not perceive the heartbeat request due to the network delay in a certain period of time, the session timeout. MS and heartbeat interval. If the MS is close, the consumer group rebalancing will be too frequent, which will affect the consumption performance

https://blog.csdn.net/fenglibing/article/details/82117166

https://kafka.apache.org/documentation/#introduction

https://docs.spring.io/spring-boot/docs/current/reference/html/application-properties.html#application-properties.integration.spring.kafka.admin.client-id

Topics: Java Big Data kafka Middleware