A Noun Interpretation of Kafka
1.topic
Topic is equivalent to a queue in the MQ of a traditional messaging system. The message sent by the producer side must specify which topic to send to. In a large application system, different topics can be distinguished according to their functions (topic of order, topic of login, topic of amount, etc.)
2. partition (partition)
There can be more than one partition under a top. After receiving a message, Kafka will load the message blance and evenly distribute the message on different partitions according to hask(message)%[broker_num]. The number of partition s configured is generally consistent with the number of kafka clusters (that is, the number of broker s)
3.partition replica (partition copy)
partition replica is the Replica data of a partition and is an optimization to prevent data loss. Partition and Replica are not on the same broker.High availability is achieved when the number of Replicas is consistent with the number of partitions
4.broker
Kafka node, a Kafka node is a broker, and multiple brokers can form a Kafka cluster. The brokerid is usually represented by the last three bits of IP.
5. Segment
Partition s can be physically divided into multiple segment s, each containing message information
6.producer
Production message, sent to top
7.consumer
Subscribe to the specified topic to consume message information on top
8.Consumer group
Multiple consumers can form a consumer group
2. Explanation and Principles of Names
1.partition
kafka's message is in the form of a key-value pair, or only topic and value. The default is null when there is no key. In most cases, a key is assigned, which has two sides of information: 1. Metadata information 2. Help partition partition to route this key and write the same batch of data into a partition A message is a producer record object and must contain two parameters, topic and value, which may not exist All message s are the same key and will be assigned to the same partition When a key is null, it will use the default partition, which is used to randomly place the producer record corresponding to the key into one of the prtition s, trying to distribute the data evenly on the top to prevent data skewing If a key is specified as shown, then the partition determines which partition the message is stored in on the top based on the hash value of the key and the number of partitions Let's test this: Where is the data sent to the partition when the stored message has a key and no key?
When the saved message has a key
/** * * @des Testing kafka partition partition information * @author zhao * @date 2019 June 27, 2001 12:17:55 a.m. * */ public class PartitionExample { private final static Logger LOG = LoggerFactory.getLogger(PartitionExample.class); public static void main(String[] args) throws InterruptedException, ExecutionException { Properties properties = initProp(); KafkaProducer<String, String> producer = new KafkaProducer<String, String>(properties); ProducerRecord<String, String> record = new ProducerRecord<String, String>("test_partition","appointKey","hello"); //When key is specified Future<RecordMetadata> future = producer.send(record); RecordMetadata recordMetadata = future.get(); LOG.info(">>>>>>>>>>>>>>>>>> {}",recordMetadata.partition()); record = new ProducerRecord<String, String>("test_partition","appointKey","world"); future = producer.send(record); recordMetadata = future.get(); LOG.info(">>>>>>>>>>>>>>>>>> {}",recordMetadata.partition()); producer.flush(); producer.close(); System.out.println("===================================="); } private static Properties initProp() { Properties prop = new Properties(); prop.put("bootstrap.servers", "192.168.199.11:9092,192.168.199.12:9092,192.168.199.13:9092"); prop.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); prop.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); return prop; } }/You can see from the log that it was sent randomlypartitionOn
22:21:06.231 [main] INFO com.zpb.kafka.PartitionExample - >>>>>>>>>>>>>>>>>> 1
22:21:06.258 [main] INFO com.zpb.kafka.PartitionExample - >>>>>>>>>>>>>>>>>> 0
When the saved message has no key
/** * * @des Testing kafka partition partition information * @author zhao * @date 2019 June 27, 2001 12:17:55 a.m. * */ public class PartitionExample { private final static Logger LOG = LoggerFactory.getLogger(PartitionExample.class); public static void main(String[] args) throws InterruptedException, ExecutionException { Properties properties = initProp(); KafkaProducer<String, String> producer = new KafkaProducer<String, String>(properties); ProducerRecord<String, String> record = new ProducerRecord<String, String>("test_partition", "hello"); Future<RecordMetadata> future = producer.send(record); RecordMetadata recordMetadata = future.get(); LOG.info(">>>>>>>>>>>>>>>>>> {}",recordMetadata.partition()); record = new ProducerRecord<String, String>("test_partition","world"); future = producer.send(record); recordMetadata = future.get(); LOG.info(">>>>>>>>>>>>>>>>>> {}",recordMetadata.partition()); producer.flush(); producer.close(); System.out.println("===================================="); } private static Properties initProp() { Properties prop = new Properties(); prop.put("bootstrap.servers", "192.168.199.11:9092,192.168.199.12:9092,192.168.199.13:9092"); prop.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); prop.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); return prop; } }//As you can see from the log, it was sent to the samepartitionin
22:29:29.963 [main] INFO com.zpb.kafka.PartitionExample - >>>>>>>>>>>>>>>>>> 2
22:29:29.969 [main] INFO com.zpb.kafka.PartitionExample - >>>>>>>>>>>>>>>>>> 2
From the above tests:
When a key or a batch of keys map the same partition, all partitions compute the mapping relationship, not necessarily the available partitions, because in multiple partitions, when a partition hangs, it also participates in the calculation, which means that when you write data, if it is sent to this pending partition, it fails to send There is only one consumer client reading one of the partitions in a conusmer group. It is impossible to have more than one consumer reading the same partition in more than one group