Kafka integrates Java API

Posted by chrispols on Thu, 30 Jan 2020 07:46:14 +0100

1, Development preparation

First of all, after setting up the kafka (version 1.0.0) environment, the development language used here is Java, the build tool Maven.  

Maven relies on the following:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>1.0.0</groupId>
  <artifactId>kafka-consumerExample</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>kafka-consumerExample</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <java.version>1.7</java.version>
    <slf4j.version>1.7.25</slf4j.version>
    <logback.version>1.2.3</logback.version>
    <kafka.version>1.0.0</kafka.version>
  </properties>

  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
    
      <dependency>
   	 <groupId>org.slf4j</groupId>
   	 <artifactId>slf4j-api</artifactId>
   	 <version>${slf4j.version}</version>
	</dependency>
    
    
    <dependency>
    	<groupId>ch.qos.logback</groupId>
    	<artifactId>logback-classic</artifactId>
    	<version>${logback.version}</version>
	</dependency>
    
    <dependency>
    	<groupId>ch.qos.logback</groupId>
    	<artifactId>logback-core</artifactId>
    	<version>${logback.version}</version>
	</dependency>
	
	
	 <!-- kafka -->
	<dependency>
   	 	<groupId>org.apache.kafka</groupId>
   		 <artifactId>kafka_2.12</artifactId>
   		 <version>${kafka.version}</version>
			 <exclusions>
				<exclusion>
					<groupId>org.apache.zookeeper</groupId>
					<artifactId>zookeeper</artifactId>
				</exclusion>
				<exclusion>
					<groupId>org.slf4j</groupId>
					<artifactId>slf4j-log4j12</artifactId>
				</exclusion>
				<exclusion>
					<groupId>log4j</groupId>
					<artifactId>log4j</artifactId>
				</exclusion>
			</exclusions>
			<scope>provided</scope> 
		</dependency>
	
	
		<dependency>
	   		 <groupId>org.apache.kafka</groupId>
	   		 <artifactId>kafka-clients</artifactId>
	  		  <version>${kafka.version}</version>
		</dependency>
		
		<dependency>
		    <groupId>org.apache.kafka</groupId>
		    <artifactId>kafka-streams</artifactId>
		    <version>${kafka.version}</version>
		</dependency>
    
  </dependencies>
</project>

2, Kafka Producer

package com.lxk.kafka.producer;

import java.util.Properties;

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;

public class KafkaProducerTest implements Runnable {

	private final KafkaProducer<String, String> producer;
	private final String topic;

	public KafkaProducerTest(String topicName) {
		Properties props = new Properties();
		props.put("bootstrap.servers", "192.168.18.103:9092,192.168.18.104:9092,192.168.18.105:9092");
		// acks=0: if set to 0, the producer does not wait for a response from kafka.
		// acks=1: this configuration means kafka will write this message to the local log file, but will not wait for the successful response of other machines in the cluster.
		// acks=all: this configuration means that the leader will wait for all follower synchronization to complete. This ensures that messages are not lost unless all machines in the kafka cluster hang up. This is the strongest guarantee of availability.
		props.put("acks", "all");
		// If it is configured to a value greater than 0, the client will resend the message if it fails to send.
		props.put("retries", 0);
		// When multiple messages need to be sent to the same partition, the producer tries to merge the network requests. This improves the efficiency of client s and producers
		props.put("batch.size", 16384);
		props.put("key.serializer", StringSerializer.class.getName());
		props.put("value.serializer", StringSerializer.class.getName());
		this.producer = new KafkaProducer<String, String>(props);
		this.topic = topicName;
	}

	public void run() {
		int messageNo = 1;
		try {
			for (;;) {
				String messageStr = "Hello, this is number one" + messageNo + "Bar data";
				producer.send(new ProducerRecord<String, String>(topic, "Message", messageStr));
				// Print when you produce 10
				if (messageNo % 10 == 0) {
					System.out.println("Messages sent:" + messageStr);
				}
				// Production 100 exit
				if (messageNo % 100 == 0) {
					System.out.println("Successfully sent" + messageNo + "strip");
					break;
				}
				messageNo++;
				// Utils.sleep(1);
			}
		} catch (Exception e) {
			e.printStackTrace();
		} finally {
			producer.close();
		}
	}

	public static void main(String args[]) {
		KafkaProducerTest test = new KafkaProducerTest("GMALL_STARTUP");
		Thread thread = new Thread(test);
		thread.start();
	}
}

output:  

14:10:43.436 [main] INFO org.apache.kafka.clients.producer.ProducerConfig - ProducerConfig values: 
    acks = all
    batch.size = 16384
    bootstrap.servers = [192.168.18.103:9092, 192.168.18.104:9092, 192.168.18.105:9092]
    buffer.memory = 33554432
    client.id = 
    compression.type = none
    connections.max.idle.ms = 540000
    enable.idempotence = false
    interceptor.classes = null
    key.serializer = class org.apache.kafka.common.serialization.StringSerializer
    linger.ms = 0
    max.block.ms = 60000
    max.in.flight.requests.per.connection = 5
    max.request.size = 1048576
    metadata.max.age.ms = 300000
    metric.reporters = []
    metrics.num.samples = 2
    metrics.recording.level = INFO
    metrics.sample.window.ms = 30000
    partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
    receive.buffer.bytes = 32768
    reconnect.backoff.max.ms = 1000
    reconnect.backoff.ms = 50
    request.timeout.ms = 30000
    retries = 0
    retry.backoff.ms = 100
    sasl.jaas.config = null
    sasl.kerberos.kinit.cmd = /usr/bin/kinit
    sasl.kerberos.min.time.before.relogin = 60000
    sasl.kerberos.service.name = null
    sasl.kerberos.ticket.renew.jitter = 0.05
    sasl.kerberos.ticket.renew.window.factor = 0.8
    sasl.mechanism = GSSAPI
    security.protocol = PLAINTEXT
    send.buffer.bytes = 131072
    ssl.cipher.suites = null
    ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
    ssl.endpoint.identification.algorithm = null
    ssl.key.password = null
    ssl.keymanager.algorithm = SunX509
    ssl.keystore.location = null
    ssl.keystore.password = null
    ssl.keystore.type = JKS
    ssl.protocol = TLS
    ssl.provider = null
    ssl.secure.random.implementation = null
    ssl.trustmanager.algorithm = PKIX
    ssl.truststore.location = null
    ssl.truststore.password = null
    ssl.truststore.type = JKS
    transaction.timeout.ms = 60000
    transactional.id = null
    value.serializer = class org.apache.kafka.common.serialization.StringSerializer

CreatePartitions(37): 0 [usable: 0])
//Message sent: Hello, this is the 10th data
14:10:44.113 [kafka-producer-network-thread | producer-1] DEBUG org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Initiating connection to node node05:9092 (id: 2 rack: null)
//Message sent: Hello, this is the 20th data
//Message sent: Hello, this is the 30th data
//Message sent: Hello, this is data 40
//Message sent: Hello, this is the 50th data
//Message sent: Hello, this is the 60th data
//Message sent: Hello, this is the 70th data
//Message sent: Hello, this is the 80th data
//Message sent: Hello, this is the 90th data
//Message sent: Hello, this is the 100th data
//100 sent successfully
14:10:44.116 [Thread-1] INFO org.apache.kafka.clients.producer.KafkaProducer - [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.

Configuration description

bootstrap.servers: address of kafka.
acks: the confirmation mechanism of the message. The default value is 0.  

  • acks=0: if set to 0, the producer does not wait for a response from kafka.  
  • acks=1: this configuration means kafka will write this message to the local log file, but will not wait for the successful response of other machines in the cluster.  
  • acks=all: this configuration means that the leader will wait for all follower s to complete synchronization. This ensures that messages are not lost unless all machines in the kafka cluster hang up. This is the strongest guarantee of availability.

retries: if configured to a value greater than 0, the client will resend the message if it fails to send.
batch.size: when multiple messages need to be sent to the same partition, the producer will try to merge the network requests. This improves the efficiency of client s and producers.
key.serializer: key serialization, default org.apache.kafka.common.serialization.StringDeserializer.
value.deserializer: value serialization, default org.apache.kafka.common.serialization.StringDeserializer.  
... 
There are more configurations. You can check the official documents, which are not explained here.  

After the configuration of kafka is added, we start to produce data. The production data code only needs to be as follows:

 producer.send(new ProducerRecord<String, String>(topic,key,value));

Topic: the name of the message queue, which can be created in kafka service first. If the topic is not created in kafka, it will be created automatically!
Key: key value, that is, the value corresponding to value, is similar to Map.
value: the data to be sent. The data format is String type.

After the producer program is written, it can be produced. The message I send here is:

 String messageStr="Hello, this is number one"+messageNo+"Bar data";

And just send 100 to exit.

3, Kafka Consumer

package com.lxk.kafka.consumer;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Properties;

import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;

public class KafkaConsumerTest implements Runnable {
	private KafkaConsumer<String, String> consumer;
	private ConsumerRecords<String, String> msgList;
	private String topic;
	private static final String GROUPID = "groupE4";

	public KafkaConsumerTest(String topicName) {
		this.topic = topicName;
		init();
	}

	public void run() {
		System.out.println("---------Start consumption---------");
		int messageNo = 1;
		List<String> list = new ArrayList<String>();
		List<Long> list2 = new ArrayList<Long>();
		try {
			for (;;) {
				msgList = consumer.poll(100);
				if (null != msgList && msgList.count() > 0) {
					for (ConsumerRecord<String, String> record : msgList) {
						if (messageNo % 10 == 0) {
							System.out.println(messageNo + "=======receive: key = " + record.key() + ", value = "
									+ record.value() + " offset===" + record.offset());
						}
						list.add(record.value());
						list2.add(record.offset());
						messageNo++;
					}
					if (list.size() == 50) {
						// Manual submission
						consumer.commitSync();
						System.out.println("Submit successfully" + list.size() + "strip,At this point offset by:" + list2.get(49));
					} else if (list.size() > 50) {
						consumer.close();
						init();
						list.clear();
						list2.clear();
					}
				} else {
					Thread.sleep(1000);
				}
			}
		} catch (InterruptedException e) {
			e.printStackTrace();
		} finally {
			consumer.close();
		}
	}

	private void init() {
		Properties props = new Properties();
		// Address of kafka consumption
		props.put("bootstrap.servers", "node03:9092,node04:9092,node05:9092");
		// Group name different group name can be consumed repeatedly
		props.put("group.id", GROUPID);
		// Auto submit or not
		props.put("enable.auto.commit", "false");
		// Timeout time
		props.put("session.timeout.ms", "30000");
		// Maximum number of pull at a time
		props.put("max.poll.records", 10);
		// When there are submitted offsets in each partition, the consumption starts from the submitted offset; when there is no submitted offset, the consumption starts from the beginning
		// latest
		// When there are submitted offsets under each partition, consumption starts from the submitted offset; when there is no submitted offset, consumption of the newly generated data under the partition
		// none
		// When the submitted offset exists in each partition of topic, consumption starts after the offset; if there is no submitted offset in one partition, an exception is thrown
		props.put("auto.offset.reset", "earliest");
		// serialize
		props.put("key.deserializer", StringDeserializer.class.getName());
		props.put("value.deserializer", StringDeserializer.class.getName());
		this.consumer = new KafkaConsumer<String, String>(props);
		// Subscribe to topic list
		this.consumer.subscribe(Arrays.asList(topic));

		System.out.println("Initialization!");
	}

	public static void main(String args[]) {
		KafkaConsumerTest test1 = new KafkaConsumerTest("GMALL_STARTUP");
		Thread thread1 = new Thread(test1);
		thread1.start();
	}
}

output:  

14:13:41.119 [Thread-1] INFO org.apache.kafka.clients.consumer.ConsumerConfig - ConsumerConfig values: 
    auto.commit.interval.ms = 5000
    auto.offset.reset = earliest
    bootstrap.servers = [node03:9092, node04:9092, node05:9092]
    check.crcs = true
    client.id = 
    connections.max.idle.ms = 540000
    enable.auto.commit = false
    exclude.internal.topics = true
    fetch.max.bytes = 52428800
    fetch.max.wait.ms = 500
    fetch.min.bytes = 1
    group.id = groupE4
    heartbeat.interval.ms = 3000
    interceptor.classes = null
    internal.leave.group.on.close = true
    isolation.level = read_uncommitted
    key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
    max.partition.fetch.bytes = 1048576
    max.poll.interval.ms = 300000
    max.poll.records = 10
    metadata.max.age.ms = 300000
    metric.reporters = []
    metrics.num.samples = 2
    metrics.recording.level = INFO
    metrics.sample.window.ms = 30000
    partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
    receive.buffer.bytes = 65536
    reconnect.backoff.max.ms = 1000
    reconnect.backoff.ms = 50
    request.timeout.ms = 305000
    retry.backoff.ms = 100
    sasl.jaas.config = null
    sasl.kerberos.kinit.cmd = /usr/bin/kinit
    sasl.kerberos.min.time.before.relogin = 60000
    sasl.kerberos.service.name = null
    sasl.kerberos.ticket.renew.jitter = 0.05
    sasl.kerberos.ticket.renew.window.factor = 0.8
    sasl.mechanism = GSSAPI
    security.protocol = PLAINTEXT
    send.buffer.bytes = 131072
    session.timeout.ms = 30000
    ssl.cipher.suites = null
    ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
    ssl.endpoint.identification.algorithm = null
    ssl.key.password = null
    ssl.keymanager.algorithm = SunX509
    ssl.keystore.location = null
    ssl.keystore.password = null
    ssl.keystore.type = JKS
    ssl.protocol = TLS
    ssl.provider = null
    ssl.secure.random.implementation = null
    ssl.trustmanager.algorithm = PKIX
    ssl.truststore.location = null
    ssl.truststore.password = null
    ssl.truststore.type = JKS
    value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer

14:13:41.123 [Thread-1] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version : 1.0.0
14:13:41.123 [Thread-1] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId : aaa7af6d4a11b29d
14:13:41.123 [Thread-1] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-5, groupId=groupE4] Kafka consumer initialized
14:13:41.123 [Thread-1] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-5, groupId=groupE4] Subscribed to topic(s): GMALL_STARTUP
//Initialization!

14:13:41.157 [Thread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=consumer-5, groupId=groupE4] Fetch READ_UNCOMMITTED at offset 800 for partition GMALL_STARTUP-2 returned fetch data (error=NONE, highWaterMark=900, lastStableOffset = -1, logStartOffset = 0, abortedTransactions = null, recordsSizeInBytes=4389)
14:13:41.158 [Thread-1] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name GMALL_STARTUP-2.records-lag
250=======receive: key = Message, value = Hello, this is number 10 offset===809
260=======receive: key = Message, value = Hello, this is number 20 offset===819
270=======receive: key = Message, value = Hello, this is number 30 offset===829
280=======receive: key = Message, value = Hello, this is number 40 offset===839
290=======receive: key = Message, value = Hello, this is the 50th data offset===849
14:13:41.162 [Thread-1] DEBUG org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-5, groupId=groupE4] Committed offset 0 for partition GMALL_STARTUP-0
14:13:41.162 [Thread-1] DEBUG org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-5, groupId=groupE4] Committed offset 0 for partition GMALL_STARTUP-1
14:13:41.162 [Thread-1] DEBUG org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-5, groupId=groupE4] Committed offset 850 for partition GMALL_STARTUP-2
//50 entries have been submitted successfully. At this time, the offset is 84914:13:41.162 [thread-1] debug org. Apache. Kafka. Clients. Consumer. Internal. Consumercoordinator - [consumer ClientID = consumer-5, groupid = groupe4] committed offset 850 for partition gmal_startup-2
//50 entries have been submitted successfully. At this time, the offset is 849
300=======receive: key = Message, value = Hello, this is number 60 offset===859

14:13:41.209 [Thread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=consumer-6, groupId=groupE4] Fetch READ_UNCOMMITTED at offset 850 for partition GMALL_STARTUP-2 returned fetch data (error=NONE, highWaterMark=900, lastStableOffset = -1, logStartOffset = 0, abortedTransactions = null, recordsSizeInBytes=4389)
14:13:41.209 [Thread-1] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name GMALL_STARTUP-2.records-lag
310=======receive: key = Message, value = Hello, this is number 60 offset===859
320=======receive: key = Message, value = Hello, this is Number 70 offset===869
330=======receive: key = Message, value = Hello, this is number 80 offset===879
340=======receive: key = Message, value = Hello, this is number 90 offset===889
350=======receive: key = Message, value = Hello, this is the 100th data offset===899
14:13:41.216 [Thread-1] DEBUG org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-6, groupId=groupE4] Committed offset 0 for partition GMALL_STARTUP-0
14:13:41.217 [Thread-1] DEBUG org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-6, groupId=groupE4] Committed offset 0 for partition GMALL_STARTUP-1
14:13:41.217 [Thread-1] DEBUG org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-6, groupId=groupE4] Committed offset 900 for partition GMALL_STARTUP-2
//50 entries have been submitted successfully. At this time, the offset is 899

Configuration description

kafka consumption should be the focus, after all, most of the time, we mainly use the data for consumption.

The configuration of kafka consumption is as follows:

bootstrap.servers: address of kafka.
group.id: different group names can be consumed repeatedly. For example, you first used group name A to consume 1000 pieces of kafka data, but you still want to consume these 100 pieces of data again, and you don't want to regenerate them, so here you only need to change the group name to be able to re consume.
enable.auto.commit: whether to submit automatically. The default value is true.
auto.commit.interval.ms: the length of callback processing from poll (pull).
session.timeout.ms: timeout.
max.poll.records: the maximum number of pull pieces at a time.
auto.offset.reset: consumption rule, default to earliest.  

  • earliest: when there are submitted offsets under each partition, consumption starts from the submitted offset; when there is no submitted offset, consumption starts from the beginning.  
  • latest: when there is a submitted offset under each partition, consumption starts from the submitted offset; when there is no submitted offset, consumption of the newly generated data under the partition.  
  • None: when the submitted offset exists in each partition of the topic, consumption starts after the offset; if there is no submitted offset in one partition, an exception will be thrown.

key.serializer: key serialization, default org.apache.kafka.common.serialization.StringDeserializer.
value.deserializer: value serialization, default org.apache.kafka.common.serialization.StringDeserializer.

Since this is a set auto submit, the consumption code is as follows:
We need to subscribe to a topic first, which is to specify which topic to consume.

consumer.subscribe(Arrays.asList(topic));

After subscription, we pull data from kafka:

ConsumerRecords<String, String> msgList=consumer.poll(1000);

Generally speaking, monitoring is used for consumption. Here we use for(;;) to monitor and set 100 consumption items to exit!  

Note the following auto commit: Details on Kafka's consumer manual submission

offset: refers to the subscript of each consumption group in kafka's topic.  
Simply put, a message corresponds to an offset subscript. If an offset is submitted each time data is consumed, the next consumption will start from the submitted offset plus one.  
For example, if there are 100 pieces of data in a topic, and I consume 50 pieces of data and submit them, then the offset submitted by kafka server record at this time is 49 (the offset starts from 0), and the next time I consume, the offset starts from 50.
 

summary

Simple development of a kafka program requires the following steps:

  1. Successfully set up kafka server and start it successfully!
  2. Get the kafka service information, and then configure it in the code.
  3. After the configuration is completed, listen for messages generated in the message queue in kafka.
  4. Process the generated data in business logic!

Refer to official documents for kafka introduction: http://kafka.apache.org/intro

391 original articles published, 348 praised, 70000 visitors+
Private letter follow

Topics: kafka Apache SSL Java