RocketMQ learning 12 - Message disk brushing

Posted by cocell on Mon, 07 Feb 2022 10:06:26 +0100

For data storage, RocketMQ uses the file programming model. In order to improve the writing performance of files, the memory mapping mechanism is usually introduced. The data is written to the page cache first, and then the page cache data is flushed to the disk. The performance and data reliability involved in writing must be considered. For the disk brushing strategy, there are generally synchronous disk brushing and asynchronous disk brushing. The same is true for RocketMQ. Asynchronous disk brushing is used by default.
Let's take a brief look at the code block of RocketMQ disk brushing operation:

            try {
                    //We only append data to fileChannel or mappedByteBuffer, never both.
                    if (writeBuffer != null || this.fileChannel.position() != 0) {
                        this.fileChannel.force(false);
                    } else {
                        // Note 4.8.1: synchronous disc dropping
                        this.mappedByteBuffer.force();
                    }
                } catch (Throwable e) {
                    log.error("Error occurred when force data to disk.", e);
                }

You can see that disk brushing actually calls the force method of MappedByteBuffer.

Synchronous brush disc

Synchronous disk flushing means that after receiving the message from the message sender, the Broker side writes the message to memory first, and then persists the content to disk before returning the message to the client.
The brush disc shall be analyzed in two lines:

  1. The first line is that when the broker starts, it will start a disk brushing thread. The call path is: brokercontroller #start() - > defaultmessagestore #start() - > commitlog #start() - > groupcommitservice #start() - > mappedfilequeue #flush();
  2. The second line is that the broker loads or updates the MappedFile after receiving the message, and then stores it into the MappedFileQueue. The call path is: sendmessageprocessor #processrequest() - > defaultmessagestore #putmessage() - > commitlog #putmessage() - > commitlog #handlediskflush() - > groupcommitrequest #waitforflush()

The disk brushing thread of the first line will perform the disk brushing operation every 10ms in a while cycle. After successful disk brushing, it will wake up the thread waiting for response in the second line. After the MappedFileQueue(CopyOnWriteArrayList type) is assembled in the second line, it will call the await method of countDownLatch to wait for the execution of the disk brushing thread.

    //After receiving the message, the broker assembles the MappedFileQueue and waits for the disk brushing thread to execute
    public void handleDiskFlush(AppendMessageResult result, PutMessageResult putMessageResult, MessageExt messageExt) {
        // Synchronization flush
        if (FlushDiskType.SYNC_FLUSH == this.defaultMessageStore.getMessageStoreConfig().getFlushDiskType()) {
            final GroupCommitService service = (GroupCommitService) this.flushCommitLogService;
            if (messageExt.isWaitStoreMsgOK()) {
                GroupCommitRequest request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes());
                service.putRequest(request);
                //Wait for the disk brushing thread to execute
                boolean flushOK = request.waitForFlush(this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout());
                if (!flushOK) {
                    log.error("do groupcommit, wait for flush failed, topic: " + messageExt.getTopic() + " tags: " + messageExt.getTags()
                        + " client address: " + messageExt.getBornHostString());
                    putMessageResult.setPutMessageStatus(PutMessageStatus.FLUSH_DISK_TIMEOUT);
                }
            } else {
                service.wakeup();
            }
        }
        // Asynchronous flush
        // Note 4.8.2: asynchronous disk brushing
        else {
            if (!this.defaultMessageStore.getMessageStoreConfig().isTransientStorePoolEnable()) {
                flushCommitLogService.wakeup();
            } else {
                commitLogService.wakeup();
            }
        }
    }
                   for (GroupCommitRequest req : this.requestsRead) {
                        // There may be a message in the next file, so a maximum of
                        // two times the flush
                        boolean flushOK = false;
                        for (int i = 0; i < 2 && !flushOK; i++) {
                            //The current disk swiped pointer is greater than the physical offset corresponding to this message, indicating that the disk has been swiped
                            flushOK = CommitLog.this.mappedFileQueue.getFlushedWhere() >= req.getNextOffset();

                            if (!flushOK) {
                                //Disc brushing operation
                                CommitLog.this.mappedFileQueue.flush(0);
                            }
                        }
                        //Wake up the thread waiting to brush the disk
                        req.wakeupCustomer(flushOK);
                    }                

About synchronous disk swiping, it should be mentioned that each disk swiping is not only a message, but a group of messages.

Asynchronous brush disk

The advantage of synchronous disk flushing is that it can ensure that the message is not lost, that is, the successful return to the customer means that the message has been persisted to the disk, that is, the message is very reliable, but the premise is to sacrifice the writing performance. However, since the message of RocketMQ is written to PageCache first, the possibility of message loss is small. If a certain probability of message loss can be tolerated, However, it can improve the performance, and asynchronous disk brushing can be considered.

Asynchronous disk flushing means that the Broker returns success immediately after storing the message in PageCache, and then starts an asynchronous thread to periodically execute the forest method of FileChannel to periodically brush the data in memory to disk. The default interval is 500ms. The asynchronous disk brushing implementation class in RocketMQ is FlushRealTimeService. Seeing that the default interval is 500ms, do you guess that FlushRealTimeService uses timed tasks?
It's not. The CountDown await method with timeout is introduced here. The advantage of this method is that if there is no new message written, it will sleep for 500ms, but it can be awakened after receiving the new message, so that the message can be swiped in time instead of waiting for 500ms.
The disk brushing thread waits in the CommitRealTimeService#run method. The wake-up disk brushing thread is in the asynchronous branch of CommitLog#handleDiskFlush.

Recovery of files

Here is just a brief mention.
File recovery is divided into file recovery after normal exit and file recovery after abnormal exit.

  • Recovery after normal exit: obtain the physical offset consumed by the last message based on ConsumerQueue. If the offset is greater than the offset in the CommitLog file, the redundant data in the ConsumerQueue will be deleted; If it is less than the offset in the CommitLog file, the message corresponding to the extra physical offset will be retransmitted to ensure that the two files are consistent.
  • Recovery after exception: the broker will record the last disk flushing timestamp of commitlog, index, consumequeue and other files, and then record a checkpoint timestamp. Take the time stamp in the checkpoint as the benchmark and compare the disk swiping time stamp in the commitlog for corresponding operations.

File recovery entry: DefaultMessageStore#recover. For details, please refer to: Learning file based programming mode from RocketMQ (2)

In addition, the cache pages involved here are related to MappedByteBuffer and zero copy. Please refer to a previous article: Zero copy in Java

Related articles: RocketMQ source code MappedFile introduction , which involves TransientStorePool temporary pool, MappedFile pre allocation, writing and disk brushing
Reference articles: Learning file based programming mode from RocketMQ (2)

Topics: Java message queue RocketMQ