Kafka fast learning II (producer and consumer development)

Posted by Burns on Fri, 28 Jan 2022 12:58:58 +0100

Partition policy for producer to send messages

1. Default policy DefaultPartitioner

When sending messages, specify the partition with the highest priority

When you specify a key when sending a message, the module will be taken according to the hash value of the key

The partition and key are not specified when sending the message. Will be randomly sent to any partition

2. To customize the partition policy, you need to implement the partition method of the Partitioner interface

producer parameter configuration

Bootstrap servers: broker address, ip:port

#Number of message retransmissions
retries: 1
#The timeout of waiting for return after submission is the same as that of retries. It controls the waiting response time of producers. Only one of the two parameters can be configured, and it is 2 minutes by default
delivery.timeout.ms: 30000
#The memory size that can be used by a batch is 16kb by default. There is no need to configure it, which means that the producer's production messages are cached first. After reaching the memory or time limit, they are sent to the broker in batches, which is similar to linker Use with MS
batch-size: 16384
#If you want to reduce the number of requests, you can set linker If MS is greater than 0, that is, the message is retained in the buffer. If it exceeds the set value, it will be submitted to the server. The default value is 0
linger.ms: 5
#Sets the size of the producer memory buffer. The size of the memory buffer that Kafka Producer can use. The default value is 32MB. There is no need to modify it
buffer-memory: 33554432
#Serialization of keys
key-serializer: org.apache.kafka.common.serialization.StringSerializer
#Serialization of values
value-serializer: org.apache.kafka.common.serialization.StringSerializer
#0 1 all； 0 means that the producer responds directly without waiting for any confirmation. 1 means that the leader drops the data to the disk and responds directly without the confirmation of the follower. All indicates that the response will not be made until all copies fall to the disk. kafka's isr mechanism ensures that the producer will not wait because the replica synchronization takes too long
acks: all

Development of peoducer production message code

public class KafkaProducerTest {

    public static final String TOPIC_NAME = "default_topic";

    public static Properties getProperties(){
        Properties props = new Properties();

        //Fill in your own ip and port
        props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "ip:port");
        // When the producer sends data to the leader, you can use request required. Acks parameter to set the level of data reliability, which are 0, 1 and all respectively.
        props.put(ProducerConfig.ACKS_CONFIG, "all");
        // If the request fails, the producer will automatically retry, specifying 0 times. If retry is enabled, there will be the possibility of duplicate messages
        props.put(ProducerConfig.RETRIES_CONFIG, 0);
        // The producer caches the unsent messages of each partition, and the size of the cache is through batch The default value is 16KB
        props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);

        /**
         * The default value is 0, and the message is sent immediately, even if batch The size buffer is not full
         * If you want to reduce the number of requests, you can set linker If MS is greater than 0, that is, the message will be submitted to the server when the retention time in the buffer exceeds the set value
         * The popular explanation is that the message that should have been sent long ago was forced to wait at least for linger Ms time, more messages are accumulated in this time, and fewer requests are sent in batches
         * If batch is filled or linker When Ms reaches the upper limit, one of them will be sent
         */
        props.put("linger.ms", 5);

        /**
         * buffer.memory Used to restrict the size of the memory buffer that Kafka Producer can use. The default value is 32MB.
         * If buffer The memory setting is too small, which may cause the message to be quickly written to the memory buffer, but the Sender thread has no time to send the message to the Kafka server
         * This will cause the memory buffer to be filled quickly. Once it is filled, it will block the user thread and stop writing messages to Kafka
         * buffer.memory To be greater than batch Size, otherwise it will report an error that the requested memory is not # enough. Do not exceed the physical memory, and adjust it according to the actual situation
         * It needs to be configured in combination with the actual business situation
         */
        props.put("buffer.memory", 33554432);

        /**
         * key Serializer that serializes the key and value objects ProducerRecord provided by the user, key Serializer must be set,
         * Even if no key is specified in the message, the serializer must be a real one
         org.apache.kafka.common.serialization.Serializer Interface class,
         * Sequence the key into a byte array.
         */
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer","org.apache.kafka.common.serialization.StringSerializer");

        return props;
    }


    /**
     * send()The method is asynchronous, adding a message to the buffer, waiting to be sent, and returning immediately
     * Producers send individual messages together in batches to improve efficiency, namely batch Size and linker MS binding
     *
     * Synchronous sending: after a message is sent, the current thread will be blocked until ack is returned
     * A Future object returned after sending a message. Just call get
     *
     * There are two Main threads for message sending: the Main user thread and the Sender thread
     *  1)main The thread sends a message to the RecordAccumulator and returns
     *  2)sender The thread pulls information from the RecordAccumulator and sends it to the broker
     *  3) batch.size And linker MS two parameters can affect the sending times of sender thread
     */
    @Test
    public void testSend(){
        Properties properties = getProperties();
        Producer<String,String> producer = new KafkaProducer<>(properties);

        for(int i=0;i<23 ;i++){
            Future<RecordMetadata> future = producer.send(new ProducerRecord<>(TOPIC_NAME,"test-value"));
            try {
                //If you don't care about the results, you don't have to write these contents
                RecordMetadata recordMetadata =  future.get();
                // topic - partition number @ offset
                System.out.println("Send results:"+recordMetadata.toString());
            } catch (InterruptedException e) {
                e.printStackTrace();
            } catch (ExecutionException e) {
                e.printStackTrace();
            }
        }
        producer.close();
    }


    /**
     * Send message carrying callback function
     */
    @Test
    public void testSendWithCallback(){
        Properties properties = getProperties();
        Producer<String,String> producer = new KafkaProducer<>(properties);

        for(int i=0;i<3 ;i++) {
            producer.send(new ProducerRecord<>(TOPIC_NAME, "test-key" + i, "test-value" + i), new Callback() {
                @Override
                public void onCompletion(RecordMetadata metadata, Exception exception) {
                    if(exception == null){
                        System.err.println("Send results:"+metadata.toString());
                    } else {
                        //fail in send
                        exception.printStackTrace();
                    }
                }
            });
        }
        producer.close();
    }



    /**
     * The sending message carries the callback function and specifies a partition
     * Implementation sequence message
     */
    @Test
    public void testSendWithCallbackAndPartition(){
        Properties properties = getProperties();
        Producer<String,String> producer = new KafkaProducer<>(properties);

        for(int i=0;i<10 ;i++) {
            //Send the specified message to the partition with subscript 4
            producer.send(new ProducerRecord<>(TOPIC_NAME, 4,"test-key" + i, "test-value" + i), new Callback() {
                @Override
                public void onCompletion(RecordMetadata metadata, Exception exception) {
                    if(exception == null){
                        //Execute business logic
                        System.err.println("Send status:"+metadata.toString());
                    } else {
                        exception.printStackTrace();
                    }
                }
            });
        }
        producer.close();
    }

    /**
     *
     * Custom partition policy
     */
    @Test
    public void testSendWithPartitionStrategy(){
        Properties properties = getProperties();
        properties.put("partitioner.class", "net.xdclass.xdclasskafka.config.CustomPartitioner");
        Producer<String,String> producer = new KafkaProducer<>(properties);

        for(int i=0;i<10 ;i++) {
            producer.send(new ProducerRecord<>(TOPIC_NAME, "custom-test-key", "test-value" + i), new Callback() {
                @Override
                public void onCompletion(RecordMetadata metadata, Exception exception) {
                    if(exception == null){
                        System.err.println("Send status:"+metadata.toString());
                    } else {
                        exception.printStackTrace();
                    }
                }
            });
        }
        producer.close();
    }
}

Custom partition policy: if the key is empty, an error will be reported. If the key is the specified value, the message will be sent to the specified partition. Otherwise, the module will be taken according to the hash value of the key

public class CustomPartitioner implements Partitioner {
    @Override
    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        if (keyBytes == null) {
            throw new IllegalArgumentException("key Parameter cannot be empty");
        }
        if("custom-test-key".equals(key)){
            //Specify partition with subscript 0
            return 0;
        }
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
        int numPartitions = partitions.size();
        // hash the keyBytes to choose a partition
        return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
    }

    @Override
    public void close() {
    }

    @Override
    public void configure(Map<String, ?> configs) {
    }
}

consumer parameter configuration

Bootstrap servers: broker address, ip:port
  #Group id 
  group.id: test-1
  #The value of offset. Early consumption starts from the earliest location and late consumption starts from the most recently saved location
  auto.offset.reset: latest
  #Heartbeat time, beyond which the consumer will be kicked out of the default 10 seconds
  session.timeout.ms: 10000
  #Whether to regularly send the offset of the obtained message to ZooKeeper. The default is true. Change it to false and submit it manually
  enable.auto.commit: false
  # Acknowledgment. When the acknowledge() method is called by the listener, the offset is immediately submitted, which is set to this value when submitting manually
  ack.mode: MANUAL_IMMEDIATE
  #If the number of consuming threads is set to 1, this instance consumes all partitions of Topic; If multiple partitions are set, all partitions will be evenly distributed; If it is distributed, the number of instances = the number of machines * concurrency
  concurrency: 1
  #Batch consumption
  batch.listener: true
  #Maximum number of batch consumption
  max.poll.records: 10

comsumer consumption message code development

public class KafkaConsumerTest {

    public static Properties getProperties() {
        Properties props = new Properties();

        //Fill in your ip and port with the broker address
        props.put("bootstrap.servers", "ip:port");
        //Consumer group ID, consumers in the group can only consume the message once, and consumers in different groups can consume the message repeatedly
        props.put("group.id", "test-g1");
        //The default is latest. If the partition message needs to be consumed from scratch, it needs to be changed to early and the consumer group name needs to be changed before it takes effect
        props.put("auto.offset.reset","latest");
        //Enable auto submit offset
        props.put("enable.auto.commit", "true");

        //serialize
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        return props;
    }


    @Test
    public void simpleConsumerTest(){
        Properties properties = getProperties();
        KafkaConsumer<String,String> kafkaConsumer = new KafkaConsumer<>(properties);
        //Subscribe to topics
        kafkaConsumer.subscribe(Arrays.asList(KafkaProducerTest.TOPIC_NAME));

        while (true){
            //Collection time, blocking timeout
            ConsumerRecords<String, String> records = kafkaConsumer.poll(Duration.ofMillis(100));
            for(ConsumerRecord record : records){
                System.err.printf("topic=%s, offset=%d,key=%s,value=%s %n",record.topic(),record.offset(),record.key(),record.value());
            }
            //Synchronous blocking commit offset
            //kafkaConsumer.commitSync();

            if(!records.isEmpty()){
                //Asynchronous commit offset
                kafkaConsumer.commitAsync(new OffsetCommitCallback() {
                    @Override
                    public void onComplete(Map<TopicPartition, OffsetAndMetadata> offsets, Exception exception) {
                        if(exception == null){
                            System.err.println("Manual submission offset success:"+offsets.toString());
                        }else {
                            System.err.println("Manual submission offset fail:"+offsets.toString());
                        }
                    }
                });
            }
        }
    }

}

Topics: Java kafka Distribution

Programmer Think