Kafka (V). Kafka & Java advanced API
1.Offset automatic control
When the consumer does not subscribe to the offset of topic, that is, kafka does not record the consumer's information, the consumer defaults to the first consumption strategy;
auto.offset.reset = latest
- Latest subscription starts with the latest offset default
- earliest single front partition
- none reports an error to the consumer without finding the consumer's previous offset
//When the server has no consumer information, it is specified to pull the data of the oldest record of the obtained partition properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
Auto submit
The offset of kafka consumer consumption data is submitted periodically by default. Ensure that all messages are consumed at least once; Related parameters
//Auto submit enable.commit.auto = true //default anto.commit.interval.ms = 5000 //default
Auto submit time
Corresponding java code configuration file
properties.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, true); properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, 10000);
Test: if the consumer is closed within 10000ms after consuming the data, but the data is not submitted automatically; After restarting the consumer, the data will still be received and the data will be consumed again. Until the time of automatic submission is exceeded and the data is determined to be consumed, the consumer will not receive this message;
Turn off auto submit
properties.put(ConsumerConfig.GROUP_ID_CONFIG,"group1"); properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
Every time consumers start, they start to consume from the front; Because it is not set to submit automatically, kafka will be received by consumers again in order to ensure that it is consumed at least once;
Manual submission
while (true){ ConsumerRecords<String, String> records = kafkaConsumer.poll(Duration.ofSeconds(1)); Map<TopicPartition, OffsetAndMetadata> offsetInfo = new HashMap<>(); if(!records.isEmpty()){ Iterator<ConsumerRecord<String, String>> iterator = records.iterator(); while (iterator.hasNext()){ ConsumerRecord<String, String> next = iterator.next(); offsetInfo.put(new TopicPartition(next.topic(),next.partition()),new OffsetAndMetadata(next.offset()+1));//Note: next.offset()+1; kafkaConsumer.commitAsync(offsetInfo, new OffsetCommitCallback() { @Override public void onComplete(Map<TopicPartition, OffsetAndMetadata> offsets, Exception exception) { System.out.println("offsets:"+next.offset()+"|||||||||||"+"offsets:"+offsets+"key:"+next.key()); System.out.println("exception:"+exception); } }); } } }
2.ACK &Retries
After sending a data, the producer asks the broker to reply to the ACK response at the specified time. If it does not respond within the specified time, the kafka producer will try to resend the data n times; Default ack = 1
-
Acks = 1 the leader will write the record to his local log, but will not wait for all followers to confirm and respond. In this case, if the leader fails immediately after confirming the record, and the follower fails before copying, the record will be lost
-
acks = 0. The producer will not wait for any response from the server. The record will be immediately added to the buffer of the socket. At this time, it is considered that it has been sent (that is, after sending the data to the local network card); At this time, you cannot assume that the server has received data
-
acks = all means that the Leader node waits for the full set of synchronous replica confirmation records. This ensures that at least one synchronous copy is still active and records will not be lost. It is the most powerful guarantee, which is equivalent to configuring acks = -1
If the producer does not receive the Ack response from Kafka's Leader within the specified time period, Kafka can turn on the retries mechanism.
request.timeout.ms = 30000 default
retires = 2147483674 default
Under the above circumstances, a message may be repeatedly written to the partition file. The idempotent write below can deal with this problem
//Guarantee ack mechanism properties.put(ProducerConfig.ACKS_CONFIG,"all"); // Number of repeated transmissions properties.put(ProducerConfig.RETRIES_CONFIG,5); // Accept Leader ack timeout properties.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG,10);
3. Idempotent write
In HTTP /1.1, idempotency is defined as: one or more requests for a resource should have the same result for the resource itself; That is, multiple executions have the same impact on the resource itself as one execution.
kafka supports idempotency in version 0.11.0.0. Idempotency is a characteristic of the producer's perspective. It can ensure that the data sent by the producer will not be lost and will not be repeated. The key of realizing idempotent writing in kafka is to identify whether the request is duplicate and filter the duplicate request;
Unique ID: to distinguish whether the request is unique or not, there should be a unique ID of the request in the request; Another parameter is whether the request has been processed. If there is a comparison between the new request and the processing record, it indicates that it is a duplicate request, and it will be rejected;
Idempotent exactly once; Messages are persisted to kafka Topic only once. During initialization, kafka will generate a PID and producer ID for the producer; A monotonically increasing serial number starting from 0 on the top of a PID board; When a new message comes, compare the serial number. If it is 1 larger than the serial number of the last persistent message, it indicates that it is a new message. If this is not the case, the broker judges that the producer is resending the message;
Enable.identotence = false default. Note: when using this, you must retrieve = true and acks= all
//Open idempotent precision write once properties.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG,true);
4. Transaction control
Idempotent writing of kafka can ensure the atomicity of a record when it is sent, but the integrity of multiple records (multiple partitions) requires the transaction operation of kafka;
kafka introduced the concept of idempotence in 0.11.0.0, which also introduced the concept of transaction. kafka's transactions are classified as
- Producer transaction Only (when a producer fails to produce multiple data, it will roll back, but the data will not be deleted. read_committed needs to be set)
- Consumer producer transaction (microservice producer and consumer in one transaction)
Generally speaking, the consumer's default write level is read_ Uncommitted data, which may read the data of transaction failure. After the producer transaction is started, the user needs to set the transaction management level of the consumer;
isolation.level = read_uncommitted
This configuration has two options, one of which is read_committed: if transaction control is enabled, the consumer must set the isolation level of the transaction to read_committed
When producing producer transactions, you only need to specify the transaction.id attribute. Once the transaction is started. The idempotent write of the producer is enabled by default. However, the value of transaction.id must be unique. Only one transaction.id can be stored at the same time, and others can be closed
**Example * * producer transactions
public static void main(String[] args) { //1. KafkaProducer is generally created as standard configuration Properties properties = new Properties(); properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "CentOSA:9092,CentOSB:9092,CentOSC:9092"); properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName()); properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName()); // //Opening a transaction must be configured as a producer properties.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "tx_id" + UUID.randomUUID().toString()); // Configure kafka batch size properties.put(ProducerConfig.BATCH_SIZE_CONFIG, 1024); // The sending time is within the specified time_ SIZE_ If config is not enough, the ms value will be sent properties.put(ProducerConfig.LINGER_MS_CONFIG, 5); // Idempotent retries acks properties.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true); properties.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG, 10); properties.put(ProducerConfig.RETRIES_CONFIG, 5); properties.put(ProducerConfig.ACKS_CONFIG, "all"); KafkaProducer<String, String> kafkaProducer = new KafkaProducer(properties); kafkaProducer.initTransactions(); try { kafkaProducer.beginTransaction(); for (int i = 0; i < 10; i++) { ProducerRecord<String, String> record = new ProducerRecord<>("topic02", "key" + i, "value" + i); //send out kafkaProducer.send(record); //report errors if (i == 5) { int b = 1 / 0; // Simulate error producer transaction rollback // Read uncommitted will read all sent data // If you read the submitted records, you will get the records before 5 } } //When the cache is enabled and needs to be refreshed, it can also be refreshed to kafka at intervals of several kafkaProducer.flush(); kafkaProducer.commitTransaction(); } catch (Exception e) { System.out.println(" Transaction error "); kafkaProducer.abortTransaction(); } finally { kafkaProducer.close(); } }
public static void main(String[] args) { //1. KafkaConsumer is generally created in standard configuration Properties properties = new Properties(); properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"CentOSA:9092,CentOSB:9092,CentOSC:9092"); properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName()); properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,StringDeserializer.class.getName()); properties.put(ConsumerConfig.GROUP_ID_CONFIG,"g2"); //Set the consumption isolation level. This is the focus. By default, read is not submitted properties.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG,"read_committed"); KafkaConsumer<String,String> kafkaConsumer = new KafkaConsumer (properties); kafkaConsumer.subscribe(Pattern.compile("^topic02.*")); while (true){ ConsumerRecords<String, String> records = kafkaConsumer.poll(Duration.ofSeconds(1)); if(!records.isEmpty()){ Iterator<ConsumerRecord<String, String>> iterator = records.iterator(); while (iterator.hasNext()){ ConsumerRecord<String, String> next = iterator.next(); System.out.println(next.key()); } } } }
Ex amp le consumer & producer transactions
topic1 producer: production data
kafkaProducer.initTransactions(); try { kafkaProducer.beginTransaction(); for (int i = 0; i < 10; i++) { ProducerRecord<String, String> record = new ProducerRecord<>("topic01", "key" + i, "value" + i); //send out kafkaProducer.send(record); //report errors if (i == 5) { int b = 1 / 0; // Simulate error producer transaction rollback // Read uncommitted will read all sent data // If you read the submitted records, you will get the records before 5 } } //When the cache is enabled and needs to be refreshed, it can also be refreshed to kafka at intervals of several kafkaProducer.flush(); kafkaProducer.commitTransaction(); } catch (Exception e) { System.out.println(" Transaction error "); kafkaProducer.abortTransaction(); } finally { kafkaProducer.close(); }
topic1 consumer & topic2 producer = = = "topic 01 error failed rollback = =" topic2 will not receive data
public static void main(String[] args) { KafkaConsumer kafkaTopic01Consumer = buildConsummer("g2"); kafkaTopic01Consumer.subscribe(Arrays.asList("topic01")); KafkaProducer kafkaTopic02Producer = buildProducer(); //1. Initialization kafkaTopic02Producer.initTransactions(); while (true){ ConsumerRecords<String, String> records = kafkaTopic01Consumer.poll(Duration.ofSeconds(1)); if(!records.isEmpty()){ Map<TopicPartition, OffsetAndMetadata> offsets = new HashMap<>(); Iterator<ConsumerRecord<String, String>> iterator = records.iterator(); //Open 1 transaction kafkaTopic02Producer.beginTransaction(); try { //Business code processing while (iterator.hasNext()){ ConsumerRecord<String, String> record = iterator.next(); System.out.println(record.key()); offsets.put(new TopicPartition(record.topic(),record.partition()),new OffsetAndMetadata(record.offset()+1)); //Data built to the next business / topic2 ProducerRecord<String,String> nextRecord = new ProducerRecord<>("topic02,",record.key(),record.value()+"Processing business 1"); kafkaTopic02Producer.send(nextRecord); } //Commit transaction kafkaTopic02Producer.sendOffsetsToTransaction(offsets,"g2"); kafkaTopic02Producer.commitTransaction(); } catch (ProducerFencedException e) { System.out.println("error"); //If you roll back the next topic02 / business processing topic, you will not receive the micro service transaction Association kafkaTopic02Producer.abortTransaction(); } } } } public static KafkaProducer buildProducer(){ //1. KafkaProducer is generally created as standard configuration Properties properties = new Properties(); properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "CentOSA:9092,CentOSB:9092,CentOSC:9092"); properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName()); properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName()); // //Opening a transaction must be configured as a producer properties.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "tx_id" + UUID.randomUUID().toString()); // Configure kafka batch size properties.put(ProducerConfig.BATCH_SIZE_CONFIG, 1024); // The sending time is within the specified time_ SIZE_ If config is not enough, the ms value will be sent properties.put(ProducerConfig.LINGER_MS_CONFIG, 5); // Idempotent retries acks properties.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true); properties.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG, 10); properties.put(ProducerConfig.RETRIES_CONFIG, 5); properties.put(ProducerConfig.ACKS_CONFIG, "all"); KafkaProducer<String, String> kafkaProducer = new KafkaProducer(properties); return kafkaProducer; } public static KafkaConsumer buildConsummer(String groupId){ //1. KafkaConsumer is generally created in standard configuration Properties properties = new Properties(); properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"CentOSA:9092,CentOSB:9092,CentOSC:9092"); properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName()); properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,StringDeserializer.class.getName()); properties.put(ConsumerConfig.GROUP_ID_CONFIG,groupId); //Set consumption isolation level properties.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG,"read_committed"); //The consumer's offset auto submission must be turned off here properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,false); KafkaConsumer<String,String> kafkaConsumer = new KafkaConsumer (properties); return kafkaConsumer; }