Kafka sequential consumption scheme

Posted by Satria Ox41464b on Tue, 08 Mar 2022 08:40:14 +0100

Source: blog csdn. net/qq_ 38245668/



This paper aims to solve the problem of sequential consumption when there is a certain data association between different topics in Kafka. If there is Topic insert and Topic update, they are the insertion and update of data respectively. When the insert and update operations are the same data, it should be ensured to insert first and then update.

1. Problem introduction

kafka's sequential consumption has always been a difficult problem to solve. kafka's consumption strategy is to ensure sequential consumption for messages with the same Topic and Partition, and the rest cannot be guaranteed. If a Topic has only one Partition, the consumption of consumers corresponding to this Topic must be orderly. In any case, different topics cannot guarantee that the consumption order of the consumer is consistent with the sending order of the producer.

If there is data association between different topics and there are requirements for consumption order, how to deal with it? This paper mainly solves this problem.

2. Solution ideas

For the existing topic insert and topic update, the unique ID of the data is ID. for the data with id=1, it is necessary to ensure that topic insert consumption is first and topic update consumption is last.

The consumption of two topics is processed by different threads. Therefore, in order to ensure that there is only one business logic processing messages with the same data ID at the same time, it is necessary to add a lock operation to the business. Locking with synchronized will affect the data consumption ability of unrelated inserts and updates, such as inserts with id=1 and updates with id=2. In the case of synchronized, it is not necessary to process them concurrently. What we need is that there is only one insert with id=1 and update with id=1 at the same time, Therefore, fine-grained locks are used to complete the locking operation.

Fine grained lock implementation: https://blog.csdn.net/qq_38245668/article/details/105891161

PS: if it is a distributed system, fine-grained locks need to use the corresponding implementation of distributed locks.

After locking the insert and update, the problem of consumption order is not solved, but only one business is processed at the same time. For the problem of abnormal consumption order, that is, update is consumed first and then insert is consumed.

Processing method: when consuming the update data, check whether the current data exists in the library (that is, whether to execute the insert). If not, store the current update data in the cache, and the key is the data id. check whether there is an update cache corresponding to the id during insert consumption. If so, it proves that the consumption order of the current data is abnormal, and the update operation needs to be executed, Then remove the cached data.

3. Implementation scheme

Message sending:

kafkaTemplate.send("TOPIC_INSERT", "1");
kafkaTemplate.send("TOPIC_UPDATE", "1");

Listening code example:


public class KafkaListenerDemo {

    //Data cache consumed
    private Map<String, String> UPDATE_DATA_MAP = new ConcurrentHashMap<>();
    //Data storage
    private Map<String, String> DATA_MAP = new ConcurrentHashMap<>();
    private WeakRefHashLock weakRefHashLock;

    public KafkaListenerDemo(WeakRefHashLock weakRefHashLock) {
        this.weakRefHashLock = weakRefHashLock;

    @KafkaListener(topics = "TOPIC_INSERT")
    public void insert(ConsumerRecord<String, String> record, Acknowledgment acknowledgment) throws InterruptedException{
        //Simulation sequence exception, that is, consumption after insert, where thread sleep

        String id = record.value();
        log.info("Received insert : :  {}", id);
        Lock lock = weakRefHashLock.lock(id);
        try {
            log.info("Start processing {} of insert", id);
            //Simulate {insert} business processing
            //Get the update data from the cache
            if (UPDATE_DATA_MAP.containsKey(id)){
                //Cache data exists, execute update
            log.info("handle {} of insert end", id);
        }finally {

    @KafkaListener(topics = "TOPIC_UPDATE")
    public void update(ConsumerRecord<String, String> record, Acknowledgment acknowledgment) throws InterruptedException{

        String id = record.value();
        log.info("Received update : :  {}", id);
        Lock lock = weakRefHashLock.lock(id);
        try {
            //The test is used without database verification
            if (!DATA_MAP.containsKey(id)){
                //The corresponding data is not found, which proves that the consumption order is abnormal. Add the current data to the cache
                log.info("The consumption order is abnormal, and update data {} Add cache", id);
                UPDATE_DATA_MAP.put(id, id);
            }else {
        }finally {

    void doUpdate(String id) throws InterruptedException{
        //Simulation update
        log.info("Start processing update: : {}", id);
        log.info("handle update: : {} end", id);


Log (scenarios with abnormal consumption order have been simulated in the code):

Received update : : 1
 The consumption order is abnormal, and update data 1 Add cache
 Received insert : : 1
 Start processing 1 of insert
 Start processing update: : 1
 handle update: : 1 end
 handle 1 of insert end

By observing the log, this scheme can normally deal with the consumption order problem of data association between different topics.

