Kafka Producer Source Code Analysis

Posted by packland on Sun, 15 Sep 2019 15:52:30 +0200

Common terms of Kafka

Broker: Kafka's server is an instance of Kafka. The Kafka cluster consists of one or more Brokers, which are responsible for receiving and processing client requests.

Topic: Topic, a logical container for messages in Kafka. Each message published to Kafka has a corresponding logical container, which is often used to differentiate business.

Partition: Partition is a physical concept that represents an orderly and invariant sequence of messages, each Topic consisting of one or more Parties

Replica: Replica. The same message in Kafka is copied to multiple places for data redundancy. These places are replicas. The replicas are divided into Leader and Follower. Different roles play different roles. Replicas are for Partition. Each partition can configure multiple replicas to achieve high availability.

Record: Messages, objects processed by Kafka

Offset: Message displacement, the location information of each message in a partition, is a monotonically increasing and unchanged value

Producer: Producer, an application that sends new messages to a topic

Consumer: Consumer, an application that subscribes to new messages from a topic

Consumer Offset: Consumer displacement, recording consumer progress, each consumer has its own consumer displacement

Consumer Group: Consumer group, where multiple consumers form a consumer group and consume multiple partitions at the same time to achieve high availability (the number of consumers in the group should not be more than the number of partitions to avoid wasting resources)

Reblance: Rebalancing, the process of automatically redistributing subscription subject partitions for other consumer instances after the number of consumer instances in the consumer group changes

Here's a picture to show some of the concepts mentioned above. (Painting with PPT is too arduous. It's been a long time. Useful drawing tools are welcome to recommend.)

Message Production Process

Let's start with a small demo of Kafka Producer

public static void main(String[] args) throws ExecutionException, InterruptedException {
        if (args.length != 2) {
            throw new IllegalArgumentException("usage: com.ding.KafkaProducerDemo bootstrap-servers topic-name");
        }

        Properties props = new Properties();
        // kafka server ip and port, multiple split by commas
        props.put("bootstrap.servers", args[0]);
        // Confirmation signal configuration
        // ack=0 means that the producer side does not need to wait for the confirmation signal and has the lowest availability.
        // ack=1 waits for at least one leader to write the message to the log successfully, which does not guarantee that the follower will write successfully if the leader goes down and the follower does not write the data successfully.
        // Message loss
        // ack=all leader needs to wait for all follower s to backup successfully, with the highest availability
        props.put("ack", "all");
        // retry count
        props.put("retries", 0);
        // Batch message size, batch processing can increase throughput
        props.put("batch.size", 16384);
        // Delayed time to send messages
        props.put("linger.ms", 1);
        // Memory size used to swap out data
        props.put("buffer.memory", 33554432);
        // key serialization
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        // value serialization
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

        // Create the KafkaProducer object and start the Sender thread when created
        Producer<String, String> producer = new KafkaProducer<>(props);
        for (int i = 0; i < 100; i++) {
            // Write a message to RecordAccumulator
            Future<RecordMetadata> result = producer.send(new ProducerRecord<>(args[1], Integer.toString(i), Integer.toString(i)));
            RecordMetadata rm = result.get();
            System.out.println("topic: " + rm.topic() + ", partition: " +  rm.partition() + ", offset: " + rm.offset());
        }
        producer.close();
    }

instantiation

The construction method of Kafka Producer is mainly based on the configuration file for some instantiation operations.

1. Resolve clientId, if not configured, by producer-incremental numbers

2. Parsing and instantiating partitioner can realize its own partitioner. For example, according to key partitioning, it can ensure that the same key is divided into the same partition, which is very useful for ensuring the order. If no partition rule is specified, the default rule (message has key, hash the key, and then model the available partition; if there is no key, use random number to model the available partition. [When there is no key, random number is inaccurate to model the available partition, and the initial counter value is random, but it is incremental later, so it can be used. To roundrobin)

3. Analyzing and instantiating the serialization of key and value

4. Analyzing and instantiating interceptors

5. Parse and instantiate Record Accumulator, which is mainly used to store messages (Kafka Producer main thread writes messages to Record Accumulator, Sender thread reads messages from Record Accumulator and sends them to Kafka)

6. Resolving Broker Address

7. Create a Sender thread and start it

...
this.sender = newSender(logContext, kafkaClient, this.metadata);
this.ioThread = new KafkaThread(ioThreadName, this.sender, true);
this.ioThread.start();
...

Message sending process

The sending entry of the message is the KafkaProducer.send method. The main process is as follows.

KafkaProducer.send
KafkaProducer.doSend
// Getting Cluster Information
KafkaProducer.waitOnMetadata 
// key/value serialization
key\value serialize
// partition
KafkaProducer.partion
// Create TopciPartion objects, record the topic and partion information of messages
TopicPartition
// Write message
RecordAccumulator.applend
// Wake up Sender thread
Sender.wakeup

RecordAccumulator

RecordAccumulator is a message queue for caching messages, grouping messages according to TopicPartition

Focus on the flow of additional messages for RecordAccumulator.applend

// Record the number of threads for applend
appendsInProgress.incrementAndGet();
// Get or create a new Deque double-ended queue based on TopicPartition
Deque<ProducerBatch> dq = getOrCreateDeque(tp);
...
private Deque<ProducerBatch> getOrCreateDeque(TopicPartition tp) {
    Deque<ProducerBatch> d = this.batches.get(tp);
    if (d != null)
        return d;
    d = new ArrayDeque<>();
    Deque<ProducerBatch> previous = this.batches.putIfAbsent(tp, d);
    if (previous == null)
        return d;
    else
        return previous;
}
// Attempt to add messages to the buffer
// Locking ensures that the same TopicPartition is written in an orderly manner
synchronized (dq) {
    if (closed)
    	throw new KafkaException("Producer closed while send in progress");
    // Attempt to write
    RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq);
    if (appendResult != null)
    	return appendResult;
}
private RecordAppendResult tryAppend(long timestamp, byte[] key, byte[] value, Header[] headers, Callback callback, Deque<ProducerBatch> deque) {
    // Remove Producer Batch from the tail of the double-ended queue
    ProducerBatch last = deque.peekLast();
    if (last != null) {
        // Get it, and try to add a message
        FutureRecordMetadata future = last.tryAppend(timestamp, key, value, headers, callback, time.milliseconds());
        // Not enough space, return null
        if (future == null)
            last.closeForRecordAppends();
        else
            return new RecordAppendResult(future, deque.size() > 1 || last.isFull(), false);
    }
    // Unfetched return null
    return null;
}
public FutureRecordMetadata tryAppend(long timestamp, byte[] key, byte[] value, Header[] headers, Callback callback, long now) {
    // Not enough space, return null
    if (!recordsBuilder.hasRoomFor(timestamp, key, value, headers)) {
        return null;
    } else {
        // Really add messages
        Long checksum = this.recordsBuilder.append(timestamp, key, value, headers);
        ...
        FutureRecordMetadata future = ...
        // future is associated with callback    
        thunks.add(new Thunk(callback, future));
        ...
        return future;
    }
}
// If you fail to try applend (return null), you will come here. If tryApplend returns directly
// Request memory space from BufferPool to create a new ProducerBatch
buffer = free.allocate(size, maxTimeToBlock);
synchronized (dq) {
    // Notice here that the previous attempt to add failed and memory has been allocated. Why try to add?
    // Because other threads may have created ProducerBatch or the previous ProducerBatch has been freed up by the Sender thread, try adding it once. If successful, the application space will be released later in final.
    RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq);
    if (appendResult != null) {
        return appendResult;
    }

    // Attempt to add failed, new Producer Batch
    MemoryRecordsBuilder recordsBuilder = recordsBuilder(buffer, maxUsableMagic);
    ProducerBatch batch = new ProducerBatch(tp, recordsBuilder, time.milliseconds());
    FutureRecordMetadata future = Utils.notNull(batch.tryAppend(timestamp, key, value, headers, callback, time.milliseconds()));

    dq.addLast(batch);
    incomplete.add(batch);
    // Set buffer to null to avoid releasing space in final summary
    buffer = null;
    return new RecordAppendResult(future, dq.size() > 1 || batch.isFull(), true);
}
finally {
    // Finally, if the addition is successful again, the memory previously requested will be released (for the purpose of creating a new Producer Batch)
    if (buffer != null)
        free.deallocate(buffer);
    appendsInProgress.decrementAndGet();
}
// Write messages to buffers
RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey,serializedValue, headers, interceptCallback, remainingWaitMs);
if (result.batchIsFull || result.newBatchCreated) {
    // Buffer full or newly created Proucer Batch, evokes Sender threads
    this.sender.wakeup();
}
return result.future;

Sender Sender Sender Message Thread

The main processes are as follows.

Sender.run
Sender.runOnce
Sender.sendProducerData
// Getting Cluster Information
Metadata.fetch
// Get the partition where messages can be sent and the node of the leader partition has been obtained
RecordAccumulator.ready
// Get the Producer Batch information from the Deque queue corresponding to topicPartion in the buffer according to the prepared node information
RecordAccumulator.drain
// Transfer messages to the production request queue for each node
Sender.sendProduceRequests
// Create a production request queue for messages
Sender.sendProducerRequest
KafkaClient.newClientRequest
// Here's the message
KafkaClient.sent
NetWorkClient.doSent
Selector.send
// Actually, it's not really I/O, it's just written into Kafka Channel.
// poll really performs I/O
KafkaClient.poll

The main flow of Sender thread through source code analysis

The construction method of KafkaProducer starts a KafkaThread thread to execute Sender at instantiation time

// Kafka Producer construction method starts Sender
String ioThreadName = NETWORK_THREAD_PREFIX + " | " + clientId;
this.ioThread = new KafkaThread(ioThreadName, this.sender, true);
this.ioThread.start();
// Sender->run()->runOnce()
long currentTimeMs = time.milliseconds();
// Send production messages
long pollTimeout = sendProducerData(currentTimeMs);
// Really perform I/O operations
client.poll(pollTimeout, currentTimeMs);
// Getting Cluster Information
Cluster cluster = metadata.fetch();
// Get the nodes that are ready to send messages and have been fetched to the leader partition
RecordAccumulator.ReadyCheckResult result = this.accumulator.ready(cluster, now);
// ReadyCheckResult contains the set of nodes that can send messages and get to the leader partition, and the top set that does not get to the leader partition node.
public final Set<Node> Node;
public final long nextReadyCheckDelayMs;
public final Set<String> unknownLeaderTopics;

The read method mainly traverses the container of adding messages to Record Accumulator, Map < Topic Partition, Deque < Producer Batch> from the cluster information to get the node where the leader partition is located according to Topic Partition, and can not find the corresponding leader node, but the topic of the message to be sent is added to unknownLeader Topics. At the same time, the nodes that can get the leader partition according to TopicPartition and satisfy the condition of sending messages are added to the nodes.

// Traversing batches
for (Map.Entry<TopicPartition, Deque<ProducerBatch>> entry : this.batches.entrySet()) {
    TopicPartition part = entry.getKey();
    Deque<ProducerBatch> deque = entry.getValue();
    // Get the node of leader partition from cluster information according to TopicPartition
    Node leader = cluster.leaderFor(part);
    synchronized (deque) {
        if (leader == null && !deque.isEmpty()) {
            // Add a topic that does not find the node where the corresponding leader partition is located but has messages to send
            unknownLeaderTopics.add(part.topic());
        } else if (!readyNodes.contains(leader) && !isMuted(part, nowMs)) {
				....
                if (sendable && !backingOff) {
                    // Add ready nodes
                    readyNodes.add(leader);
                } else {
                   ...
}

Then traverse the returned unknownLeaderTopics, add topic to the metadata information, and call the metadata.requestUpdate method to request the update of metadata information.

for (String topic : result.unknownLeaderTopics)
    this.metadata.add(topic);
    result.unknownLeaderTopics);
	this.metadata.requestUpdate();

Final checking of ready nodes, removing those nodes that are not ready to connect, is mainly based on KafkaClient. read method.

Iterator<Node> iter = result.readyNodes.iterator();
long notReadyTimeout = Long.MAX_VALUE;
while (iter.hasNext()) {
    Node node = iter.next();
    // Call the KafkaClient. read method to verify that the node connection is ready
    if (!this.client.ready(node, now)) {
        // Remove Unready Nodes
        iter.remove();
        notReadyTimeout = Math.min(notReadyTimeout, this.client.pollDelayMs(node, now));
    }
}

Let's start creating a request for a production message

// Remove the Deque double-ended queue corresponding to TopicPartition from RecordAccumulator, and then remove the Producer Batch from the head of the double-ended queue as the message to be sent.
Map<Integer, List<ProducerBatch>> batches = this.accumulator.drain(cluster, result.readyNodes, this.maxRequestSize, now);

Encapsulate messages as ClientRequest

ClientRequest clientRequest = client.newClientRequest(nodeId, requestBuilder, now, acks != 0,requestTimeoutMs, callback);

Calling KafkaClient to send a message (not really performing I/O) involves Kafka Channel. Kafka's communication adopts NIO mode

// NetworkClient.doSent Method
String destination = clientRequest.destination();
RequestHeader header = clientRequest.makeHeader(request.version());
...
Send send = request.toSend(destination, header);
InFlightRequest inFlightRequest = new InFlightRequest(clientRequest,header,isInternalRequest,request,send,now);
this.inFlightRequests.add(inFlightRequest);
selector.send(send);

...

// Selector.send method    
String connectionId = send.destination();
KafkaChannel channel = openOrClosingChannelOrFail(connectionId);
if (closingChannels.containsKey(connectionId)) {
    this.failedSends.add(connectionId);
} else {
    try {
        channel.setSend(send);
    ...

At this point, the job of sending messages is almost ready. Call the KafkaClient.poll method to really perform I/O operations.

client.poll(pollTimeout, currentTimeMs);

Summarize the Sender thread process with a diagram

Through the above introduction, we have sorted out the main process of Kafka production message, which involves the main thread writing messages to RecordAccumulator, while the background Sender thread gets messages from RecordAccumulator, sends messages to Kafka by NIO, and summarizes them with a graph.

Epilogue

This is the first time that this public number has tried to write a source-related article. To tell the truth, I really don't know how to write it. The feeling of code screenshot and pasting the whole code has been denied by me. Finally, this way is adopted to introduce the main process, omit the irrelevant code and cooperate with the flow chart.

Last week, I attended the Huawei Cloud Kafka Practical Course. I simply looked at the production and consumption codes of kafka. I wanted to sort them out. Then I started reading the source code and sorting out the process at 8.17 on Sunday noon. I wrote it till 12:00 p.m. There was still a little unfinished. I finished this article early Monday morning. Of course, this article ignores a lot of more details, and will continue to go deeper, dare to try, keep improving, refueling!

Reference material

Huawei Cloud Actual Warfare

Geek Time kafka Column

Topics: Big Data kafka Apache