Kafka (component analysis integration springboot integration practice)

Posted by Mercenary on Wed, 19 Jan 2022 00:45:41 +0100

@[TOC] (Kafka (component building springboot integration practice))

Kafka (component building springboot integration practice)

1. Application scenario

1.1 kafka scenario

Kafka was originally developed by LinkedIn in Scala and based on ZooKeeper. Now it has been donated to the Apache foundation. At present, Kafka has been positioned as a distributed stream processing platform. It is widely used because of its high throughput, persistence, horizontal scalability, support for stream processing and other characteristics.
Apache Kafka can support the data transmission of massive data. Kafka is widely used in offline and real-time message processing business systems.
(1) log collection: collect logs of various services and open them to various consumer s through kafka in the form of unified interface services, such as Hadoop, Hbase, Solr, etc;
(2) Message system: decoupling producers and consumers, caching messages, etc;
(3) User activity tracking: Kafka is often used to record various activities of web users or app users, such as web browsing, search, click and other activities. These activity information is published by various servers to Kafka topics, and then subscribers subscribe to these topics for real-time monitoring and analysis, or load them into Hadoop and data warehouse for offline analysis and mining;
(4) Operational indicators: Kafka is also often used to record operational monitoring data. Including collecting data of various distributed applications and producing centralized feedback of various operations, such as alarm and report;
(5) Streaming processing: such as spark streaming and storm;

1.2 kafka characteristics

kafka is famous for its high throughput and has the following characteristics:
(1) High throughput and low latency: kafka can process hundreds of thousands of messages per second, and its latency is only a few milliseconds;
(2) Scalability: kafka cluster supports hot expansion;
(3) Persistence and reliability: messages are persisted to local disks, and data backup is supported to prevent data loss;
(4) Fault tolerance: nodes in the cluster are allowed to fail (if the number of replicas is n, n-1 nodes are allowed to fail);
(5) High concurrency: support thousands of clients to read and write at the same time;

1.3 message comparison

  • If ordinary business messages are decoupled and message transmission, rabbitMq is the first choice. It is simple enough, easy to manage and usable.
  • If kafka is recommended in the above high throughput and real-time scenarios such as logs, message collection and access records, it is based on distributed and easy to expand
  • For heavy business, to achieve high reliability, consider rocketMq, but it is too heavy. You need to know enough

1.4 large plant application

JD builds a data platform through kafka to analyze users' buying, browsing and other behaviors. Successfully resist the flow peak of 6.18
Ali introduced its own rocketmq based on kafka's concept. In the design, we refer to kafka's architecture

2. Basic components

2.1 roles

  • broker: node is the machine you see
  • provider: the producer who sends messages
  • Consumer: consumer who reads messages
  • zookeeper: Information Center, where kafka's various information is recorded
  • controller: one of the broker s is responsible for managing the whole cluster as a leader. If you hang up, choose the master again with the help of zk

2.2 logic components

  • Topic: topic is a message channel. You always know where to send and receive messages
  • Partition: partition. Each topic can have multiple partitions to share data transmission. Multiple paths are parallel and the throughput is large
  • Replicas: replicas. Multiple replicas can be set for each partition, and the data between replicas is consistent. Equivalent to backup, more reliable with spare tire
  • Leader & follower: master and slave. One of the above replicas is identified as leader and the others as follower. The leader handles all read and write requests for the partition

2.3 replica set

  • AR (assigned replicas): a general term for all replicas, AR=ISR+OSR
  • ISR (in sync replicas): the replica in synchronization can participate in leader selection. Once it lags too much (the two dimensions of quantity lag and time lag), it will be kicked to OSR.
  • OSR (out sync replicas): kick out the synchronized replica, keep chasing the leader, and enter the ISR after catching up

2.4 message marking


Offset: offset. Which message is consumed? Each consumer has its own offset


HW: (high watermark): the high watermark value of the replica, which is the location that the client can consume at most. The HW value is 8, which means that all 9 messages with offset [0,8] can be consumed. They are visible to the consumer, while the 4 messages [9,12] are not visible to the consumer because they are not submitted.
LEO: (log end offset): log end offset, which represents the offset of the next message to be written in the log file. There is no message on this offset. Both the leader copy and the follower copy have this value.

So what is the relationship between the three?
For example, when the number of copies is equal to 3, the value of LEO will be updated after the message is sent to Leader A, and Follower B and Follower C will also pull the message in Leader A to update themselves in real time. HW represents the log displacement reached by A, B and C at the same time, that is, the smallest value of LEO among A, B and C. Due to the delay between B and C pulling A messages, HW is generally less than LEO, that is, LEO > = HW.

LEO>=HW>=OFFSET

3. Architecture exploration

3.1 development process

http://kafka.apache.org/downloads

3.1.1 version naming

Kafka's naming rule before version 1.0.0 is 4 digits, for example, 0.8.2.1, 0.8 is the large version number, 2 is the small version number, and 1 indicates that a patch has been applied
The current version number naming rule is 3 digits, and the format is "large version number" + "small version number" + "number of revised patches", such as 2.5.0. The front 2 represents the large version number, the middle 5 represents the small version number, and 0 represents no patch
The download package we see is the version of the scala compiler in front and the real kafka version in the back.

3.1.2 evolution history

Version 0.7 only provides the most basic message queuing function.
Version 0.8 introduces the replica mechanism. So far, Kafka has become a truly complete distributed and highly reliable message queue solution.
In version 0.9, permissions and authentication are added, and Java is used to rewrite the new consumer API and Kafka Connect function; Consumer API is not recommended;
In version 0.10, Kafka Streams was introduced to officially upgrade the component distributed stream processing platform; Recommended version 0.10.2.2; A new version of the consumer API is recommended
Version 0.11 producer API idempotent, transaction API, message format reconstruction; Recommended version 0.11.0.3; Be cautious about message format changes
1.0 and 2.0 Kafka Streams improvements; Recommended version 2.0;

3.2 cluster construction

1) Native boot
zookeeper is required for kafka startup. The first step is to start zk:

docker run --name zookeeper-1 -d -p 2181 zookeeper:3.4.13

Native installation: after downloading, unzip and start http://kafka.apache.org/downloads

bin/kafka-server-start.sh config/server.properties
#server.properties configuration description 
#Represents the number of brokers. If there are multiple brokers in the cluster, the number of each broker needs to be set differently 
broker.id=0 
#The service entry address provided by brocder is 9092 by default 
listeners=PLAINTEXT://:9092 
#Set the address where the message log file is stored 
log.dirs=/tmp/kafka/log 
#This is the key to the Zookeeper cluster address required by Kafka! Kafka joining the same zk is the same cluster 
zookeeper.connect=zookeeper:2181 

2) It is recommended to start docker compose with one click

#docker-compose.yml
#Pay attention to the problem of hostname. The ip address is 52.82.98.209. Replace it with your own server
#Docker compose up - d start
version: '3'
services:
    zookeeper:
        image: zookeeper:3.6.3

    kafka-1:
        container_name: kafka-1
        image: wurstmeister/kafka
        ports:
            - 10903:9092
        environment:
            KAFKA_BROKER_ID: 1 
            HOST_IP: 192.168.31.236
            KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
            #The external accessible ip and port must be set for docker deployment, otherwise the address registered in zk will not be reachable, resulting in external connection failure
            KAFKA_ADVERTISED_HOST_NAME: 192.168.31.236
            KAFKA_ADVERTISED_PORT: 10903 
        volumes:
            - /etc/localtime:/etc/localtime
        depends_on:
            - zookeeper           
    kafka-2:
        container_name: kafka-2
        image: wurstmeister/kafka
        ports:
            - 10904:9092
        environment:
            KAFKA_BROKER_ID: 2 
            HOST_IP: 192.168.31.236
            KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
            KAFKA_ADVERTISED_HOST_NAME: 192.168.31.236
            KAFKA_ADVERTISED_PORT: 10904 
        volumes:
            - /etc/localtime:/etc/localtime
        depends_on:
            - zookeeper 


3.3 component exploration

The command line tool is the most direct tool for managing kafka clusters. The official comes with it, and no additional installation is required.

3.2.1 topic creation

#Enter container 
docker exec -it kafka-1 sh 
#Enter the bin directory 
cd /opt/kafka/bin 
#establish 
kafka-topics.sh --zookeeper zookeeper:2181 --create --topic test --partitions 2 
--replication-factor 1

3.2.2 viewing topics

kafka-topics.sh --zookeeper zookeeper:2181 --list

3.2.3 subject details

kafka-topics.sh --zookeeper zookeeper:2181 --describe --topic test
#Analysis output: 
#Note the format: Partition: (partition number, starting from 0), Leader, Replicas, Isr: (machine number, that is, the broker_id configured at startup) 


Topic: test	PartitionCount: 2	ReplicationFactor: 1	Configs:
	Topic: test	Partition: 0	Leader: 2	Replicas: 2	Isr: 2
	Topic: test	Partition: 1	Leader: 1	Replicas: 1	Isr: 1

3.2.4 messaging

#Use docker to connect to a container in any cluster 
docker exec -it kafka-1 sh 
#Enter the directory in kafka's container 
cd /opt/kafka/bin 
#Client listening 
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test 
#Start another terminal and verify the transmission 
./kafka-console-producer.sh --broker-list localhost:9092 --topic test

If both clients listen, it is found that it is broadcast

3.2.5 group consumption

#When two consumer s are started, if the group information is not specified, the message is broadcast 
#Specify the same group to allow multiple consumers to consume separately (drawing: Group principle) 
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test -- 
group aaa 
#Results: at the sender, 1-4 or 4 messages are sent continuously. Two consumer s in the same group consume alternately and execute concurrently

be careful!!!
This is when the number of consumers and partitions are equal (both 2). If (number of consumers > number of partitions) in the same group, some consumers will be idle.

Verification method:
You can start a few more consumers to try, and you will find that when more than two, some will never consume messages. Stop what can be consumed, then the idle will be activated and enter the working state

3.2.6 designated zoning

#Specify the partition through the parameter -- partition. Note! You need to remove the group above 
#The meaning of specifying partitions is to ensure the sequence of message transmission (drawing: kafka sequence principle) 
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test -- 
partition 0 
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test -- 
partition 1 
#Result: 1-4 messages are sent and appear alternately. The description message is evenly distributed to each partition for delivery 
#By default, no key is specified for sending 
#To specify partition sending, you need to define a key. Then the same key is routed to the same partition 
./kafka-console-producer.sh --broker-list kafka-1:9092 --topic test --property 
parse.key=true 
#Carry the key and then send it. Note that the key and value are separated by tab 
>1 1111 
>1 2222 
>2 3333 
>2 4444 

3.2.7 offset

#The offset determines where the message is consumed. It supports: the beginning or the end 
# Earlist: when there are submitted offsets under each partition, consumption starts from the submitted offset; When there is no committed offset, consumption starts from scratch 
# latest: when there are submitted offsets under each partition, consumption starts from the submitted offset; When there is no committed offset, the newly generated data under the partition is consumed 
# none:topic when there are committed offsets in each partition, consumption starts after offset; As long as there is no committed offset in one partition, an exception is thrown 
#--Offset [early | latest (default)], or -- from beginning 
#Start a new terminal and specify the offset position 
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test -- 
partition 0 --offset earliest 
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test -- 
partition 0 --from-beginning 
#Results: the messages sent before were consumed again from the beginning!

# Be careful!!! If there is a submission offset, it is still mainly submitted. Even if earlist is used, those earlier than the submission point will not be extracted
# The following instructions, under the group=aa group, will start from the beginning only for the first time 
# Even if you specify starting later, you will not start at the earliest, because there is an offset after the first consumption, so it takes precedence!
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test -- 
group begining-group --from-beginning 

3.4 zk exploring Secrets

As mentioned earlier, zk stores the relevant information of kafka cluster. This section explores the internal secrets.
The information of kafka is recorded in zk. Enter zk container to view relevant nodes and information

docker exec -it kafka_zookeeper_1 sh 
>./bin/zkCli.sh 
>ls / 
#Result: the following configuration information is obtained 
[admin, brokers, cluster, config, consumers, controller, controller_epoch, feature, isr_change_notification, latest_producer_id_block, log_dir_event_notification, zookeeper]

3.4.1 broker information

[zk: localhost:2181(CONNECTED) 2] ls /brokers
[ids, seqid, topics]
[zk: localhost:2181(CONNECTED) 3] ls /brokers/ids
[1, 2, 3]
[zk: localhost:2181(CONNECTED) 4] get /brokers/ids/1
{"features":{},"listener_security_protocol_map":{"PLAINTEXT":"PLAINTEXT"},"endpoints":["PLAINTEXT://192.168.31.236:10903"],"jmx_port":-1,"port":10903,"host":"192.168.31.236","version":5,"timestamp":"1625978230130"}
cZxid = 0x27 
ctime = Tue Jan 05 05:40:45 GMT 2021 
mZxid = 0x27 
mtime = Tue Jan 05 05:40:45 GMT 2021 
pZxid = 0x27 
cversion = 0 
dataVersion = 1 
aclVersion = 0 
ephemeralOwner = 0x105a2db626b0000 
dataLength = 196 
numChildren = 0

3.4.2 theme and zoning

[zk: localhost:2181(CONNECTED) 8] ls /brokers/topics
[__consumer_offsets, test]
[zk: localhost:2181(CONNECTED) 9]  ls /brokers/topics/test
[partitions]
[zk: localhost:2181(CONNECTED) 10] ls /brokers/topics/test/partitions
[0, 1]
[zk: localhost:2181(CONNECTED) 11]  ls /brokers/topics/test/partitions/0
[state]
[zk: localhost:2181(CONNECTED) 12] get /brokers/topics/test/partitions/0/state
{"controller_epoch":2,"leader":1,"version":1,"leader_epoch":2,"isr":[1]}
[zk: localhost:2181(CONNECTED) 13]
cZxid = 0xb0 
ctime = Tue Jan 05 05:56:06 GMT 2021 
mZxid = 0xb0 
mtime = Tue Jan 05 05:56:06 GMT 2021 
pZxid = 0xb0 
cversion = 0 
dataVersion = 0 
aclVersion = 0 
ephemeralOwner = 0x0 
dataLength = 72 
numChildren = 0

3.4.3 consumers and offsets

[zk: localhost:2181(CONNECTED) 15] ls /consumers 
[]
#Empty??? 
#So, where are consumers and their offsets???

There are two ways to record the consumption offset of kafka consumer group:
1) kafka self maintenance (New)
2) zookpeer maintenance (old), has been gradually abandoned

View by:
The above consumption uses the console tool, which uses - bootstrap server. Without zk, it will not be recorded under / consumers.
The offset of its consumers will be updated to the topic [_consumer_offsets] of kafka

#Start a consumer and specify group 
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --group aaa 
#Use the console tool to view the consumer and offset 
./kafka-consumer-groups.sh --bootstrap-server kafka-1:9092 --list 
KMOffsetCache-44acff134cad 
aaa 
#View offset details 
./kafka-consumer-groups.sh --bootstrap-server kafka-1:9092 --describe --group 
aaa 

At present, it is consistent with LEO, which means that the messages have been consumed completely

After stopping the consumer, send several more records to the provider, and the offset starts to lag:

Restart the consumer, consume the latest message, and then return to see the offset. The message is synchronized

3.4.4 controller

#Who is the master node in the current cluster
[zk: localhost:2181(CONNECTED) 17] get /controller 
{"version":1,"brokerid":1,"timestamp":"1609825245694"} 
cZxid = 0x2a 
ctime = Tue Jan 05 05:40:45 GMT 2021 
mZxid = 0x2a 
mtime = Tue Jan 05 05:40:45 GMT 2021 
pZxid = 0x2a 
cversion = 0 
dataVersion = 0 
aclVersion = 0 
ephemeralOwner = 0x105a2db626b0000 
dataLength = 54 
numChildren = 0 

3.5.1 startup

kafka manager is the most popular kafka cluster management tool. It was first opened source by Yahoo. Provides visual kafka cluster operations
Official website: https://github.com/yahoo/kafka-manager/releases
Pay attention to its version. The scene version of docker community lags behind kafka. Let's mirror it ourselves

#Dockerfile 
FROM daocloud.io/library/java:openjdk-8u40-jdk 
ADD kafka-manager-2.0.0.2/ /opt/km2002/ 
CMD ["/opt/km2002/bin/kafka-manager","- 
Dconfig.file=/opt/km2002/conf/application.conf"] 
#Package and put kafka-manager-2.0.0.2 in the same directory 
docker build -t km:2002 . 
#Start: add a paragraph under the services node in the yml above 
#Reference: km yml 
#Execution: docker compose - f km yml up -d 
km: 
image: km:2002 
ports:
- 10906:9000 
depends_on: 
- zookeeper

3.5.2 use

Using km, you can easily view the following information:

  • Cluster: create a cluster, fill in zk address, and select jmx, consumer information and other options
  • brokers: list, machine information
  • Topic: topic information, partition information within a topic. Create new themes and add partitions
  • cosumers: consumer information, offset, etc

4. Deep application

4.1 springboot-kafka

1) Configuration file

  kafka:
    bootstrap-servers: 52.82.98.209:10903,52.82.98.209:10904
    producer: # Producer producer
      retries: 0 # retry count
      acks: 1 # Response level: how many partition replicas send ack confirmation to the producer when the backup is completed (optional: 0, 1, all/-1)
      batch-size: 16384 # Batch size
      buffer-memory: 33554432 # Production side buffer size
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
#      value-serializer: com.itheima.demo.config.MySerializer
      value-serializer: org.apache.kafka.common.serialization.StringSerializer

    consumer: # Consumer consumer
      group-id: javagroup # Default consumer group ID
      enable-auto-commit: true # Automatically submit offset
      auto-commit-interval: 100  # Submit offset delay (how long to submit offset after receiving the message)

      # Earlist: when there are submitted offsets under each partition, consumption starts from the submitted offset; When there is no committed offset, consumption starts from scratch
      # latest: when there are submitted offsets under each partition, consumption starts from the submitted offset; When there is no committed offset, the newly generated data under the partition is consumed
      # none:topic when there are committed offsets in each partition, consumption starts after offset; As long as there is no committed offset in one partition, an exception is thrown
      auto-offset-reset: latest
      key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
#      value-deserializer: com.itheima.demo.config.MyDeserializer
      value-deserializer: org.apache.kafka.common.serialization.StringDeserializer

pom.xml

    <name>kafka</name>
    <url>http://maven.apache.org</url>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.kafka</groupId>
            <artifactId>spring-kafka</artifactId>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.62</version>
        </dependency>

        <!--swagger2 Enhanced, official ui too low , Access address: /doc.html  -->
        <dependency>
            <groupId>com.github.xiaoymin</groupId>
            <artifactId>swagger-bootstrap-ui</artifactId>
            <version>1.9.6</version>
        </dependency>
        <dependency>
            <groupId>io.springfox</groupId>
            <artifactId>springfox-swagger2</artifactId>
            <version>2.9.2</version>
        </dependency>

    </dependencies>

2) Startup information

4.2 message sending

4.2.1 sending type

When KafkaTemplate calls send, asynchronous sending is adopted by default. If you need to get the sending results synchronously, call the get method
Detailed code reference: asyncproducer java

@RestController
public class AsyncProducer {
    private final static Logger logger = LoggerFactory.getLogger(AsyncProducer.class);
    @Resource
    private KafkaTemplate<String, Object> kafkaTemplate;

    //Synchronous transmission
    @GetMapping("/kafka/sync/{msg}")
    public void sync(@PathVariable("msg") String msg) throws Exception {
        Message message = new Message();
        message.setMessage(msg);
        ListenableFuture<SendResult<String, Object>> future = kafkaTemplate.send("test", JSON.toJSONString(message));
        SendResult<String, Object> result = future.get(3,TimeUnit.SECONDS);
        logger.info("send result:{}",result.getProducerRecord().value());
    }

}

Consumer use: kafkaconsumer java

//Default consumption group consumption
@Component
public class KafkaConsumer {
    private final Logger logger = LoggerFactory.getLogger(KafkaConsumer.class);

    //If no group is specified, the group configured in yml will be used by default
    @KafkaListener(topics = {"test"})
    public void onMessage1(ConsumerRecord<?, ?> consumerRecord) {
        Optional<?> optional = Optional.ofNullable(consumerRecord.value());
        if (optional.isPresent()) {
            Object msg = optional.get();
            logger.info("message:{}", msg);
        }
    }
}

1) Synchronous transmission

ListenableFuture<SendResult<String, Object>> future = 
kafkaTemplate.send("test", JSON.toJSONString(message)); 
//Note that you can set the waiting time. After it is exceeded, you will no longer wait for the result 
SendResult<String, Object> result = future.get(3,TimeUnit.SECONDS); 
logger.info("send result:{}",result.getProducerRecord().value()); 

Send through swagger, and the console can print send result normally

2) Block
On the server, the kafka service will be suspended

Send message in swagger
Synchronous sending: the request is blocked, waiting all the time, and an error is returned after timeout

For asynchronous sending (the default sending interface), the request is returned immediately.

Then, how to confirm the sending of asynchronous messages??? Look down!
3) Register listening
Code reference: kafkalistener java
You can set a Listener for kafkaTemplate to listen to message sending and implement internal corresponding methods

//@Configuration
public class KafkaListener {
    private final static Logger logger = LoggerFactory.getLogger(KafkaListener.class);

    @Autowired
    KafkaTemplate kafkaTemplate;

    //Configure listening
    @PostConstruct
    private void listener(){
        kafkaTemplate.setProducerListener(new ProducerListener<String, Object>() {
            @Override
            public void onSuccess(ProducerRecord<String, Object> producerRecord, RecordMetadata recordMetadata) {
                logger.info("ok,message={}",producerRecord.value());
            }

            @Override
            public void onError(ProducerRecord<String, Object> producerRecord, Exception exception) {
                logger.error("error!message={}",producerRecord.value());
            }
        });
    }
}

Check the console. After waiting for a period of time, the message of asynchronous sending failure will be recalled to the registered listener

com.itheima.demo.config.KafkaListener:error!message= {"message":"1","sendTime":1609920296374}

Start kafka

docker-compose unpause kafka-1 kafka-2

When sending a message again, both synchronous and asynchronous messages can be sent and received normally, and listening enters the success callback

com.itheima.demo.config.KafkaListener$1:ok,message= 
{"message":"1","sendTime":1610089315395} 
com.itheima.demo.controller.PartitionConsumer:patition=1,message: 
[{"message":"1","sendTime":1610089315395}]

You can see that in the internal class KafkaListener , that is, the message of the registered Listener

4.2.2 serialization

Consumer use: kafkaconsumer java

@Component
public class KafkaConsumer {
    private final Logger logger = LoggerFactory.getLogger(KafkaConsumer.class);

    //If no group is specified, the group configured in yml will be used by default
    @KafkaListener(topics = {"test"})
    public void onMessage1(ConsumerRecord<?, ?> consumerRecord) {
        Optional<?> optional = Optional.ofNullable(consumerRecord.value());
        if (optional.isPresent()) {
            Object msg = optional.get();
            logger.info("message:{}", msg);
        }
    }
}

1) Serialization details

  • The previous use is Kafka's own string serializer
    (org.apache.kafka.common.serialization.StringSerializer)
  • In addition, there are: ByteArray, ByteBuffer, Bytes, Double, Integer, Long, etc
  • These serializers implement interfaces (org.apache.kafka.common.serialization.Serializer)
  • Basically, it can meet most scenarios

2) Custom serialization
You can implement the corresponding interface by yourself. There are the following methods:

public interface Serializer<T> extends Closeable {
    default void configure(Map<String, ?> configs, boolean isKey) {
    }
    //In theory, only this can work normally
    byte[] serialize(String var1, T var2);
    //Default to the above method
    default byte[] serialize(String topic, Headers headers, T data) {
        return this.serialize(topic, data);
    }

    default void close() {
    }
}

Case, reference: myserializer java

public class MySerializer implements Serializer {

    @Override
    public byte[] serialize(String s, Object o) {
        String json = JSON.toJSONString(o);
        return json.getBytes();
    }

}

Configure your own encoder in yaml

value-serializer: com.itheima.demo.config.MySerializer

After resending, it is found that the encoding callback at the sending end of the message is normal. But the content of the consumer message is wrong!

com.itheima.demo.controller.KafkaListener$1:ok,message= 
{"message":"1","sendTime":1609923570477} 
com.itheima.demo.controller.KafkaConsumer:message:" 
{\"message\":\"1\",\"sendTime\":1609923570477}" 

What should I do?
3) Decode
If the sender has a code and we define the code ourselves, the receiver should be equipped with the corresponding decoding strategy
Code reference: mydeserializer Java, the implementation method is almost the same as the encoder!

Configure your own decoder in yaml

value-deserializer: com.itheima.demo.config.MyDeserializer
public class MyDeserializer implements Deserializer {
    private final static Logger logger = LoggerFactory.getLogger(MyDeserializer.class);

    @Override
    public Object deserialize(String s, byte[] bytes) {
        try {
            String json = new String(bytes,"utf-8");
            return JSON.parse(json);
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        }
        return null;
    }

}

Send and receive again, and the message is normal

com.itheima.demo.controller.AsyncProducer$1:ok,message= 
{"message":"1","sendTime":1609924855896} 
com.itheima.demo.controller.KafkaConsumer:message: 
{"message":"1","sendTime":1609924855896}

4.2.3 zoning strategy

The partition policy determines the partition to which messages are delivered according to the key, and it is also the cornerstone of sequential consumption guarantee.

  • Given the partition number, send the data directly to the specified partition
  • If the partition code is not given and the key value of the data is given, get the hashCode through the key to partition
  • Neither the partition code nor the key value is given, and the partition is carried out by rotation directly
  • Customize the partition. Do whatever you want

1) Validate default partition rules

Sender code reference: partitionproducer java

@RestController
public class PartitionProducer {
    @Resource
    private KafkaTemplate<String, Object> kafkaTemplate;

//    Specify partition send
//    No matter what your key is, go to the same partition
    @GetMapping("/kafka/partitionSend/{key}")
    public void setPartition(@PathVariable("key") String key) {
        kafkaTemplate.send("test", 0,key,"key="+key+",msg=Specify partition 0");
    }

//    Specify the key to send without specifying the partition
//    hash according to the key, and the same key to the same partition
    @GetMapping("/kafka/keysend/{key}")
    public void setKey(@PathVariable("key") String key) {
        kafkaTemplate.send("test", key,"key="+key+",msg=No partition specified");
    }


}

Consumer code usage: partitionconsumer java

//@Component
public class PartitionConsumer {
    private final Logger logger = LoggerFactory.getLogger(PartitionConsumer.class);

    //Regional consumption
    @KafkaListener(topics = {"test"},topicPattern = "0")
    public void onMessage(ConsumerRecord<?, ?> consumerRecord) {
        Optional<?> optional = Optional.ofNullable(consumerRecord.value());
        if (optional.isPresent()) {
            Object msg = optional.get();
            logger.info("partition=0,message:[{}]", msg);
        }
    }
    @KafkaListener(topics = {"test"},topicPattern = "1")
    public void onMessage1(ConsumerRecord<?, ?> consumerRecord) {
        Optional<?> optional = Optional.ofNullable(consumerRecord.value());
        if (optional.isPresent()) {
            Object msg = optional.get();
            logger.info("partition=1,message:[{}]", msg);
        }
    }
}

Access setKey through swagger:

Look at the console:

Then access setPartition to set the partition number 0 to send

Look at the console:

2) Custom partition
Do you want to define your own rules and put messages into the corresponding partition according to my requirements? sure!
Reference code: mypartitioner java

public class MyPartitioner implements Partitioner {

    @Override
    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
//        Define your own partition policy
//                If the key starts with 0, send it to partition 0
//                Throw the rest to Division 1
        String keyStr = key+"";
        if (keyStr.startsWith("0")){
            return 0;
        }else {
            return 1;
        }
    }

    @Override
    public void close() {

    }

    @Override
    public void configure(Map<String, ?> map) {

    }
}

MyPartitionTemplate.java ,

@Configuration
public class MyPartitionTemplate {
 
    private final Logger logger = LoggerFactory.getLogger(this.getClass());

    @Value("${spring.kafka.bootstrap-servers}")
    private String bootstrapServers;

    KafkaTemplate kafkaTemplate;

    @PostConstruct
    public void setKafkaTemplate() {
        Map<String, Object> props = new HashMap<>();
        props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
        props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
        props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
        //Note that the partition is here!!!
        props.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, MyPartitioner.class);
        this.kafkaTemplate = new KafkaTemplate<String, String>(new DefaultKafkaProducerFactory<>(props));
    }

    public KafkaTemplate getKafkaTemplate(){
        return kafkaTemplate;
    }
 

}

Send using: mypartitionproducer java

@RestController
public class MyPartitionProducer {

    @Autowired
    MyPartitionTemplate template;

//    Send a message using a key starting with 0 and any other letter
//    Look at the console output. Which partition is it in?
    @GetMapping("/kafka/myPartitionSend/{key}")
    public void setPartition(@PathVariable("key") String key) {
        template.getKafkaTemplate().send("test", key,"key="+key+",msg=Custom partition policy");
    }


}

Use swagger to send two types of key s beginning with 0 and not beginning with 0. Try it!
remarks:
It's troublesome to define the config parameter yourself. You need to break the default KafkaTemplate setting
Kafkaconfiguration can be The getTemplate in Java is annotated with @ bean to override the system default bean
Here, @ Autowire injection is used to avoid confusion

4.3 message consumption

4.3.1 message group

Sender: kafkaproducer java

@RestController
public class KafkaProducer {
    @Resource
    private KafkaTemplate<String, Object> kafkaTemplate;

    @GetMapping("/kafka/test/{msg}")
    public void sendMessage(@PathVariable("msg") String msg) {
        Message message = new Message();
        message.setMessage(msg);
        kafkaTemplate.send("test", JSON.toJSONString(message));
    }
}

1) Code reference: groupconsumer Three copies of Java and Listener are given to two groups respectively to verify the group consumption:

@Component
public class GroupConsumer {
    private final Logger logger = LoggerFactory.getLogger(GroupConsumer.class);

    //Group 1, consumer 1
    @KafkaListener(topics = {"test"},groupId = "group1")
    public void onMessage1(ConsumerRecord<?, ?> consumerRecord) {
        Optional<?> optional = Optional.ofNullable(consumerRecord.value());
        if (optional.isPresent()) {
            Object msg = optional.get();
            logger.info("group:group1-1 , message:{}", msg);
        }
    }

    //Group 1, consumer 2
    @KafkaListener(topics = {"test"},groupId = "group1")
    public void onMessage2(ConsumerRecord<?, ?> consumerRecord) {
        Optional<?> optional = Optional.ofNullable(consumerRecord.value());
        if (optional.isPresent()) {
            Object msg = optional.get();
            logger.info("group:group1-2 , message:{}", msg);
        }
    }

    //Group 2, only one consumer
    @KafkaListener(topics = {"test"},groupId = "group2")
    public void onMessage3(ConsumerRecord<?, ?> consumerRecord) {
        Optional<?> optional = Optional.ofNullable(consumerRecord.value());
        if (optional.isPresent()) {
            Object msg = optional.get();
            logger.info("group:group2 , message:{}", msg);
        }
    }
}

2) Start

3) Send 2 messages through swagger

Two consumers in the same group share messages equally in group1
There is only one consumer in group2 and they get all the information

4) Consumer idle
Pay attention to the matching between the number of partitions and the number of consumers. If (number of consumers > number of partitions), consumers will be idle and waste resources!

Verification method:
Stop the project, delete the test theme, and create a new one. This time, only one partition is assigned to it.
Resend two messages and try

Resolution:
group2 can consume 1 and 2 messages
There are two consumers under group1, but they are only allocated to - 1 and - 2. This process is idle

4.3.2 displacement submission

1) Auto submit
In the previous case, if we set the following two options, kafka will automatically submit according to the delay setting

enable-auto-commit: true # Automatically submit offset 
auto-commit-interval: 100 # Submit offset delay (how long to submit offset after receiving the message)

2) Manual submission
Sometimes, we need to manually control the submission timing of offset, such as ensuring that messages are submitted after strict consumption, so as to prevent loss or duplication.
Let's define the configuration ourselves and override the above parameters

@Configuration
public class MyOffsetConfig {

    private final Logger logger = LoggerFactory.getLogger(this.getClass());

    @Value("${spring.kafka.bootstrap-servers}")
    private String bootstrapServers;

    @Bean
    public KafkaListenerContainerFactory<?> manualKafkaListenerContainerFactory() {
        Map<String, Object> configProps = new HashMap<>();
        configProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
        configProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
        configProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
        // Pay attention here!!! Set manual submission
        configProps.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
        ConcurrentKafkaListenerContainerFactory<String, String> factory =
                new ConcurrentKafkaListenerContainerFactory<>();
        factory.setConsumerFactory(new DefaultKafkaConsumerFactory<>(configProps));

        // ack mode:
        //          AckMode for enable_ AUTO_ COMMIT_ It takes effect when config = false. There are the following types:
        //
        //          RECORD
        //          Once for each commit
        //
        //          Batch (default)
        //          Each poll is submitted in batches, and the frequency depends on the call frequency of each poll
        //
        //          TIME
        //          Commit at the interval of ackTime each time (what's the difference between auto commit interval and auto commit interval?)
        //
        //          COUNT
        //          Accumulated ack counts to commit
        //
        //          COUNT_TIME
        //          If the ackTime or ackCount condition is satisfied first, commit
        //
        //          MANUAL
        //          listener is responsible for ack, but behind it is also batch up
        //
        //          MANUAL_IMMEDIATE
        //          listner is responsible for ack. Every time it is called, it commit s immediately

        factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL_IMMEDIATE);

        return factory;
    }


}

There are several ways to submit the offset through the Consumer on the Consumer side:

//@Component
public class MyOffsetConsumer {
    private final Logger logger = LoggerFactory.getLogger(this.getClass());

    @KafkaListener(topics = "test",groupId = "myoffset-group-1",containerFactory = "manualKafkaListenerContainerFactory")
    public void manualCommit(@Payload String message,
                        @Header(KafkaHeaders.RECEIVED_PARTITION_ID) int partition,
                        @Header(KafkaHeaders.RECEIVED_TOPIC) String topic,
                        Consumer consumer,
                        Acknowledgment ack) {
        logger.info("Manually submit offset , partition={}, msg={}", partition, message);
        // Synchronous submission
        consumer.commitSync();
        //Asynchronous commit
        //consumer.commitAsync();

        // ACK submission is also OK. It will follow the set ack policy (refer to the ACK mode in MyOffsetConfig.java)
        // ack.acknowledge();
    }

    @KafkaListener(topics = "test",groupId = "myoffset-group-2",containerFactory = "manualKafkaListenerContainerFactory")
    public void noCommit(@Payload String message,
                        @Header(KafkaHeaders.RECEIVED_PARTITION_ID) int partition,
                        @Header(KafkaHeaders.RECEIVED_TOPIC) String topic,
                        Consumer consumer,
                        Acknowledgment ack) {
        logger.info("Forget to submit offset, partition={}, msg={}", partition, message);
        // No commit!
    }

    /**
     * Reality:
     * commitSync Combined with commitAsync
     *
     * Manually submit asynchronous consumer commitAsync();
     * Submit the consumer manually commitSync()
     *
     * commitSync()Method submits the last offset. Before successfully submitting or encountering an error without recovery,
     * commitSync()Will keep trying, but commitAsync() won't.
     *
     * In general, for the occasional submission failure, there will be no big problem without retrying
     * Because if the submission failure is caused by temporary problems, subsequent submissions will always succeed.
     * However, if this is the last submission before closing consumers or rebalancing, ensure that the submission is successful. Otherwise, it will cause repeated consumption
     * Therefore, commitAsync() and commitSync() are usually used together before the consumer closes.
     */
//   @KafkaListener(topics = "test", groupId = "myoffset-group-3",containerFactory = "manualKafkaListenerContainerFactory")
    public void manualOffset(@Payload String message,
                       @Header(KafkaHeaders.RECEIVED_PARTITION_ID) int partition,
                       @Header(KafkaHeaders.RECEIVED_TOPIC) String topic,
                       Consumer consumer,
                       Acknowledgment ack) {
        try {
            logger.info("Synchronous asynchronous collocation , partition={}, msg={}", partition, message);
            //Asynchronous commit first
            consumer.commitAsync();
            //Keep doing something else
        } catch (Exception e) {
            System.out.println("commit failed");
        } finally {
            try {
                consumer.commitSync();
            } finally {
                consumer.close();
            }
        }

    }


    /**
     * You can even submit manually, specifying an offset anywhere
     * Not recommended for daily use!!!
     */
//    @KafkaListener(topics = "test", groupId = "myoffset-group-4",containerFactory = "manualKafkaListenerContainerFactory")
    public void offset(ConsumerRecord record, Consumer consumer) {
        logger.info("Manually specify any offset, partition={}, msg={}",record.partition(),record);
        Map<TopicPartition, OffsetAndMetadata> currentOffset = new HashMap<>();
        currentOffset.put(new TopicPartition(record.topic(), record.partition()),
                new OffsetAndMetadata(record.offset() + 1));
        consumer.commitSync(currentOffset);
    }

}

Synchronous submission and asynchronous submission: manualCommit(), the difference between synchronous and asynchronous, which will be discussed in detail below.
Specify offset commit: offset()
3) Repeated consumption problem
If manual submission mode is turned on, be sure not to forget to submit the offset. Otherwise, it will cause repeated consumption!
Code reference and comparison: manualcommit(), nocommit()

Verification process:
Delete the test topic with km and create an empty test topic. It is convenient to observe the message offset, comment out the Component annotation of other consumers, and only keep the current myoffsetconsumer Start the project with Java, send several consecutive messages using Kafka producer of swagger, pay attention to the console, and you can consume them. No problem:

But! Try restarting:

No matter how many times you restart, the consumption group that does not submit the offset will consume again!!!

Try to query the offset through the command line:

4) Experience and summary

commitSync()Method, synchronous commit, commits the last offset. Before successfully submitting or encountering an error without recovery, 
commitSync()Will keep trying, but commitAsync()can't. 

This creates a trap: 
If you commit asynchronously, it will not be a big problem if you do not retry the occasional submission failure, because if the submission failure is due to a temporary problem 
If so, subsequent submissions will always succeed. Once successful, the offset will be submitted. 
But! If this is the last submission when the consumer is closed, ensure that the submission is successful. If the submission is not finished, stop the entry 
Cheng. It will cause repeated consumption! 
Therefore, it is generally used in combination before consumers close commitAsync()and commitSync().  
Detailed code reference: MyOffsetConsumer.manualOffset() 

Topics: kafka