The process of kafka noun interpretation and principle analysis

Posted by dudeddy on Wed, 26 Jun 2019 18:45:47 +0200

A Noun Interpretation of Kafka

1.topic

 Topic is equivalent to a queue in the MQ of a traditional messaging system. The message sent by the producer side must specify which topic to send to. In a large application system, different topics can be distinguished according to their functions (topic of order, topic of login, topic of amount, etc.)

2. partition (partition)

 

There can be more than one partition under a top. After receiving a message, Kafka will load the message blance and evenly distribute the message on different partitions according to hask(message)%[broker_num].

The number of partition s configured is generally consistent with the number of kafka clusters (that is, the number of broker s)

3.partition replica (partition copy)

 

partition replica is the Replica data of a partition and is an optimization to prevent data loss. Partition and Replica are not on the same broker.High availability is achieved when the number of Replicas is consistent with the number of partitions

4.broker

Kafka node, a Kafka node is a broker, and multiple brokers can form a Kafka cluster. The brokerid is usually represented by the last three bits of IP.

5. Segment

Partition s can be physically divided into multiple segment s, each containing message information

6.producer

Production message, sent to top

7.consumer

Subscribe to the specified topic to consume message information on top

8.Consumer group

Multiple consumers can form a consumer group

2. Explanation and Principles of Names

1.partition

kafka's message is in the form of a key-value pair, or only topic and value. The default is null when there is no key. In most cases, a key is assigned, which has two sides of information:

1. Metadata information

2. Help partition partition to route this key and write the same batch of data into a partition

A message is a producer record object and must contain two parameters, topic and value, which may not exist

All message s are the same key and will be assigned to the same partition

When a key is null, it will use the default partition, which is used to randomly place the producer record corresponding to the key into one of the prtition s, trying to distribute the data evenly on the top to prevent data skewing

If a key is specified as shown, then the partition determines which partition the message is stored in on the top based on the hash value of the key and the number of partitions

Let's test this: Where is the data sent to the partition when the stored message has a key and no key?

When the saved message has a key

/**
 * 
 * @des        Testing kafka partition partition information                              
 * @author  zhao
 * @date    2019 June 27, 2001 12:17:55 a.m.
 *
 */
public class PartitionExample {
    
    private final static  Logger LOG = LoggerFactory.getLogger(PartitionExample.class);
    
    public static void main(String[] args) throws InterruptedException, ExecutionException {
        
        Properties properties = initProp();
        KafkaProducer<String, String> producer = new KafkaProducer<String, String>(properties);
        ProducerRecord<String, String> record = new ProducerRecord<String, String>("test_partition","appointKey","hello");   //When key is specified
        Future<RecordMetadata> future = producer.send(record);
        RecordMetadata recordMetadata = future.get();
        LOG.info(">>>>>>>>>>>>>>>>>> {}",recordMetadata.partition());
        
        record = new ProducerRecord<String, String>("test_partition","appointKey","world");
        future = producer.send(record); recordMetadata = future.get();
        LOG.info(">>>>>>>>>>>>>>>>>> {}",recordMetadata.partition());
         
        producer.flush();
        producer.close();
        System.out.println("====================================");
    }
    
    private static Properties initProp() {
        Properties prop = new Properties();
        prop.put("bootstrap.servers", "192.168.199.11:9092,192.168.199.12:9092,192.168.199.13:9092");
        prop.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        prop.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        
        return prop;
    }
}

/You can see from the log that it was sent randomlypartitionOn

22:21:06.231 [main] INFO com.zpb.kafka.PartitionExample - >>>>>>>>>>>>>>>>>> 1

22:21:06.258 [main] INFO com.zpb.kafka.PartitionExample - >>>>>>>>>>>>>>>>>> 0

When the saved message has no key

/**
 * 
 * @des        Testing kafka partition partition information                              
 * @author  zhao
 * @date    2019 June 27, 2001 12:17:55 a.m.
 *
 */
public class PartitionExample {
    
    private final static  Logger LOG = LoggerFactory.getLogger(PartitionExample.class);
    
    public static void main(String[] args) throws InterruptedException, ExecutionException {
        
        Properties properties = initProp();
        KafkaProducer<String, String> producer = new KafkaProducer<String, String>(properties);
        ProducerRecord<String, String> record = new ProducerRecord<String, String>("test_partition", "hello");
        Future<RecordMetadata> future = producer.send(record);
        RecordMetadata recordMetadata = future.get();
        LOG.info(">>>>>>>>>>>>>>>>>> {}",recordMetadata.partition());
        
        record = new ProducerRecord<String, String>("test_partition","world");
        future = producer.send(record); recordMetadata = future.get();
        LOG.info(">>>>>>>>>>>>>>>>>> {}",recordMetadata.partition());
         
        producer.flush();
        producer.close();
        System.out.println("====================================");
    }
    
    private static Properties initProp() {
        Properties prop = new Properties();
        prop.put("bootstrap.servers", "192.168.199.11:9092,192.168.199.12:9092,192.168.199.13:9092");
        prop.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        prop.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        
        return prop;
    }
}

//As you can see from the log, it was sent to the samepartitionin


22:29:29.963 [main] INFO com.zpb.kafka.PartitionExample - >>>>>>>>>>>>>>>>>> 2


22:29:29.969 [main] INFO com.zpb.kafka.PartitionExample - >>>>>>>>>>>>>>>>>> 2

From the above tests:
When a key or a batch of keys map the same partition, all partitions compute the mapping relationship, not necessarily the available partitions, because in multiple partitions, when a partition hangs, it also participates in the calculation, which means that when you write data, if it is sent to this pending partition, it fails to send There is only one consumer client reading one of the partitions in a conusmer group. It is impossible to have more than one consumer reading the same partition in more than one group

Topics: PHP kafka Apache