This paper records the design and implementation process of message queue with persistence function which I participated in Aliyun middleware competition a month ago. It should be noted that LocalMQ draws on the core design idea of RocketMQ in Broker. The earliest source code is also based on RocketMQ source code modification. This article covers references and other information about message queues Here The source code is placed in LocalMQ Warehouse In addition, the author's level is limited, and since graduation travel has not been optimized, many of the contents of this article may have fallacies and shortcomings, please criticize and correct.
LocalMQ: Building RocketMQ-like high-performance message queues from scratch
The so-called message queue is intuitively like a reservoir, which can decouple between producers and consumers, and balance the difference between calculating amount and calculating time between producers and consumers. At present, the mainstream message queues are famous Kafka, RabbitMQ, RocketMQ and so on. In the author's realization LocalMQ The three versions of Memory Message MQ, Embedded Message Queue and Local Message Queue are implemented successively from simplicity to replication. It needs to be noted that in the three versions of message queues, the so-called pull mode is adopted, that is, the mode in which consumers actively request pull messages from message queues to cancel messages. Many internal function and performance test cases are provided under the wx.demo. * package.
// First here: https://parg.co/beX download code // Then modify the inheritance class corresponding to DefaultProducer // Testing MemoryMessageQueue inherits MemoryProducer. // Testing Embedded MessageQueue inherits Embedded Producer. // By default, test LocalMessageQueue. Note that the same changes need to be made to DefaultPullConsumer public class DefaultProducer extends LocalProducer // Running test cases using mvn can also be opened in Eclipse or Intellij mvn clean package -U assembly:assembly -Dmaven.test.skip=true java -Xmx2048m -Xms2048m -cp open-messaging-wx.demo-1.0.jar wx.demo.benchmark.ProducerBenchmark
The simplest Memory Message Queue is to store message data in memory according to the selected topic. Its main structure is as follows:
Memory Message Queue provides synchronous message submission and pull operations, which use HashMap heap storage to cache all messages, and maintains another so-called Queue Offsets in memory to record consumption offsets for queues corresponding to each topic. Embedded MessageQueue provides a slightly more complex message queue that supports disk persistence than the simple message queue that MemoryMessageQueue implements that cannot be persisted. Embedded MessageQueue builds Mapped Partition Queue based on MappedByteBuffer provided by Java NIO. Each MappedPartitionQueue corresponds to multiple physical files on disk and provides a single logical file for the upper application abstraction. The Embedded MessageQueue structure is shown in the following figure:
The main process of Embedded Message Queue is that producers submit messages synchronously like Bucket Queue, and each Bucket can be regarded as a Topic or Queue. Embedded MessageQueue also contains an asynchronous thread responsible for periodically writing data persistence from MappedPartitionQueue to disk, which periodically completes Flush operations. Embedded MessageQueue assumes that a BucketQueue is occupied by a Consumer after it is assigned to it, and that the Consumer consumes all of its cached messages; each Consumer contains an independent Consumer Offset Table to record current queue consumption. The drawbacks of Embedded MessageQueue are:
Mixed Processing and Markup Bits: Embedded Message Queue provides only the simplest message serialization model and cannot record additional message attributes.
Timing of persistent storage to disk: Embedded MessageQueue uses only one level of cache and only persists files when a Partition is full;
Post-processing of adding messages: Embedded Message Queue is to write messages directly into Mapped Partition Queue contained in BucketQueue. It can not dynamically index and filter messages after processing, and its scalability is poor.
The case of intermittent pulling is not considered: Embedded MessageQueue assumes that Consumer can process all the messages of a single Partition in a BucketQueue at a single time, so when recording its processing value, only the displacement at the file level is recorded. If there exists a time when only part of the content in a single Partition is pulled out, the next starting pull point is still the next header of the file.
In Embedded Message Queue, we can persist messages into files separately in each Producer thread, while in Local Message Queue, we write messages into Message Store, and then post PutMessage Service for secondary processing. The structure of LocalMessageQueue is as follows:
The biggest change of Local Message Queue is to store messages uniformly in an independent Message Store (similar to CommitLog in RocketMQ) and then divide them into different ConsumeQueues for Topic-queueId, where queueId is determined by the corresponding Producer exclusive editor and each Consumer is assigned to occupy a ConsumeQueue (similar to Rocket MQ). The consumequeue in MQ ensures that messages produced by a Producer are consumed by a dedicated Constumer. LocalMessageQueue also uses MappedPartitionQueue to provide the underlying file system abstraction, and constructs an independent Consumer Offset Manager to manage the consumer's consumption schedule, thus facilitating exception recovery.
Design outline
Sequential consumption
This section draws from Principle and Practice of Distributed Open Message System (RocketMQ)
An important feature of message products is sequence assurance, that is to say, the order of message consumption should be consistent with the order of sending time; in the case of multiple senders, the cost of guaranteeing global sequence is relatively high, as long as the order of each sender is guaranteed; for example, P1 sends M11, M12, M13, P2 sends M21, M22, M23, only M11, M23 when consuming. The order of 12, M13 (M21, M22, M23), that is to say, the actual consumption order is: M11, M21, M12, M13, M22, M23 is correct; M11, M21, M22, M12, M13, M23 is correct M11, M13, M13, M21, M22, M23, M12 is wrong, the order of M12 and M13 is reversed; if the producer produces two messages: M1, M2, the most intuitive way to ensure the order of these two messages is similar to that in TCP. Confirmation message:
However, in this model, if M1 and M2 are sent to two different message servers respectively, we can not control the timing of sending M1 and M2 to the message server; it is possible that M2 has been sent to consumers before M1 is sent to the message server. To solve this problem, the idea of the improved version is to send M1 and M2 to a single message server, and then to the corresponding consumers according to the principle of first arrival, first consumption:
However, in practice, M2 will be consumed before M1 when the transmission time of M1 is longer than M2 because of network delay or other problems. Therefore, if we want to ensure strict sequential messages, we must ensure a one-to-one correspondence among producers, message servers and consumers. In the implementation of LocalMQ, we first divide the message into a unique Topic-queueId queue according to the producer, and ensure that the consumption queue will only be monopolized by one consumer at the same time. If a consumer accidentally interrupts the queue before consuming it, the queue will not be reallocated during the retention window period; the queue will be assigned to a new consumer outside the window period, and even if the original consumer resumes work, it will not be able to continue pulling the message contained in the queue.
data storage
LocalMQ is currently a file system-based persistent storage. Its main functions are implemented in Mapped Partition and Mapped Partition Queue. The author will also introduce the implementation of these two classes in detail below. In this section, we discuss the file format of data storage. For LocalMessageQueue, the file format is as follows:
* messageStore * -- MapFile1 * -- MapFile2 * consumeQueue * -- Topic1 * ---- queueId1 * ------ MapFile1 * ------ MapFile2 * ---- queueId2 * ------ MapFile1 * ------ MapFile2 * -- Queue1 * ---- queueId1 * ------ MapFile1 * ------ MapFile2 * ---- queueId2 * ------ MapFile1 * ------ MapFile2
Local Message Queue uses a unified message storage scheme, so the actual content of all messages will be stored in the messageStore directory. The consumeQueue stores the index of the message, which is the offset address in the message store. LocalMQ uses MappedPartitionQueue to manage a single file logically, and automatically cuts it into multiple physically independent Mapped File s depending on the size of the single file. Each MappedPartition uses offset, that is, the global offset of the first address of the file is named; pos / position is used to unify the local offset in a single file, and index is used to indicate the subscript of a file in its folder.
performance optimization
In the process of writing, the author found that the optimization of execution flow, avoiding repeated calculation and additional variables, and choosing the appropriate concurrency strategy would have a great impact on the results. For example, after the author switched from SpinLock to reentry lock, the local test TPS increased by about 5%. In addition, the author also counts the time proportion of different stages in consumer work, in which the construction (including serialization of message attributes) and the sending operation (written to MappedFileQueue, without secondary cache) are synchronized, and the time proportion of both is the largest.
[2017-06-01 12:13:21,802] INFO: construction time: 0.471270, transmission time: 0.428567, persistence time: 0.100163 [2017-06-01 12:25:31,275] INFO: Construct time-consuming ratio: 0.275170, send time-consuming ratio: 0.573520, persistence time-consuming ratio: 0.151309
Code level optimization
In the process of implementing LocalMQ, the author feels most deeply that the performance of different codes that implement the same function may vary greatly. In the process of implementation, redundant variable declaration and creation should be avoided, extra space application and garbage collection should be avoided, and redundant execution process should be avoided. In addition, appropriate data structures should be selected as far as possible, such as the migration of the author from ArrayList to LinkedList and from Concurrent HashMap to HashMap in some implementations.
Asynchronous IO
Asynchronous IO, sequential Flush; I found that if multiple threads concurrent Flush operation, instead of single thread sequential Flush.
concurrency control
Minimize the scope of lock control.
Concurrent computing optimization, put all time-consuming computing in a concurrent roducer.
With a reasonable lock, the re-entry phase lock is nearly five times higher than the spin lock in TPS.
MemoryMessageQueue
Source code reference Here
MemoryMessageQueue is the simplest implementation, but its code can reflect the basic flow of a message queue. First, in the producer, we need to create a message and send it to the message queue:
// Create messages BytesMessage message = messageFactory.createBytesMessageToTopic(topic, body); // send message messageQueue.putMessage(topic, message);
In the putMessage function, messages are stored in memory:
// Store all messages private Map<String, ArrayList<Message>> messageBuckets = new HashMap<>(); // Adding messages public synchronized PutMessageResult putMessage(String bucket, Message message) { if (!messageBuckets.containsKey(bucket)) { messageBuckets.put(bucket, new ArrayList<>(1024)); } ArrayList<Message> bucketList = messageBuckets.get(bucket); bucketList.add(message); return new PutMessageResult(PutMessageStatus.PUT_OK, null); }
Consumer pulls and cancels interest rates according to the specified Bucket and queueId, and polls if there are multiple Buckets that need to be pulled:
//use Round Robin int checkNum = 0; while (++checkNum <= bucketList.size()) { String bucket = bucketList.get((++lastIndex) % (bucketList.size())); Message message = messageQueue.pullMessage(queue, bucket); if (message != null) { return message; } }
The pullMessage function of MemoryMessageQueue first determines whether the target Bucket exists or not, and then judges whether the pull-out is complete according to the pull offset recorded in the built-in queueOffset. If not, the message is returned and the local offset is updated.
private Map<String, HashMap<String, Integer>> queueOffsets = new HashMap<>(); ... public synchronized Message pullMessage(String queue, String bucket) { ... ArrayList<Message> bucketList = messageBuckets.get(bucket); if (bucketList == null) { return null; } HashMap<String, Integer> offsetMap = queueOffsets.get(queue); if (offsetMap == null) { offsetMap = new HashMap<>(); queueOffsets.put(queue, offsetMap); } int offset = offsetMap.getOrDefault(bucket, 0); if (offset >= bucketList.size()) { return null; } Message message = bucketList.get(offset); offsetMap.put(bucket, ++offset); ... }
EmbeddedMessageQueue
Source code reference Here
Message persistence support is introduced in Embedded Message Queue. In this section, we also discuss message serialization and the underlying MappedPartitionQueue implementation.
Message serialization
The message format defined in Embedded MessageQueue is as follows:
Serial number | Message Storage Structure | Remarks | Length (bytes) |
---|---|---|---|
1 | TOTALSIZE | Message size | 4 |
2 | MAGICCODE | MAGIC CODE for Messages | 4 |
3 | BODY | The first four bytes store the message body size value, and the second body Length size space stores the message body content. | 4 + bodyLength |
4 | headers* | The first two bytes (short) store the header size, and then the header Length size header data. | 2 + headersLength |
5 | properties* | The first two bytes (short) store the attribute value size, and then the attribute data of the property Length size. | 2 + propertiesLength |
Embedded Message Serializer is the main class responsible for message persistence inherited from Message Serializer. It provides a function for calculating message length.
/** * Description Calculate the length of a message. Note that headers ByteArray and properties ByteArray complete the transformation when sending a message. * @param message * @param headersByteArray * @param propertiesByteArray * @return */ public static int calMsgLength(DefaultBytesMessage message, byte[] headersByteArray, byte[] propertiesByteArray) { // Message Body byte[] body = message.getBody(); int bodyLength = body == null ? 0 : body.length; // Calculating Head Length short headersLength = (short) headersByteArray.length; // Calculate attribute length short propertiesLength = (short) propertiesByteArray.length; // Calculate the total length of message body return calMsgLength(bodyLength, headersLength, propertiesLength); }
The encode function of Embedded Message Encoder is responsible for the specific message serialization operation:
/** * Description Encoding of messages * @param message Message object * @param msgStoreItemMemory Internal cache handle * @param msgLen Computed message length * @param headersByteArray Message header byte sequence * @param propertiesByteArray Message attribute byte sequence */ public static final void encode( DefaultBytesMessage message, final ByteBuffer msgStoreItemMemory, int msgLen, byte[] headersByteArray, byte[] propertiesByteArray ) { // Message Body byte[] body = message.getBody(); int bodyLength = body == null ? 0 : body.length; // Calculating Head Length short headersLength = (short) headersByteArray.length; // Calculate attribute length short propertiesLength = (short) propertiesByteArray.length; // Initialize storage space resetByteBuffer(msgStoreItemMemory, msgLen); // 1 TOTALSIZE msgStoreItemMemory.putInt(msgLen); // 2 MAGICCODE msgStoreItemMemory.putInt(MESSAGE_MAGIC_CODE); // 3 BODY msgStoreItemMemory.putInt(bodyLength); if (bodyLength > 0) msgStoreItemMemory.put(message.getBody()); // 4 HEADERS msgStoreItemMemory.putShort((short) headersLength); if (headersLength > 0) msgStoreItemMemory.put(headersByteArray); // 5 PROPERTIES msgStoreItemMemory.putShort((short) propertiesLength); if (propertiesLength > 0) msgStoreItemMemory.put(propertiesByteArray); }
The corresponding deserialization operation is performed by Embedded Message Decoder, which reads data from a ByteBuffer:
/** * Description Deserialize message objects from input ByteBuffer * * @return 0 Come the end of the file // >0 Normal messages // -1 Message checksum failure */ public static DefaultBytesMessage readMessageFromByteBuffer(ByteBuffer byteBuffer) { // 1 TOTAL SIZE int totalSize = byteBuffer.getInt(); // 2 MAGIC CODE int magicCode = byteBuffer.getInt(); switch (magicCode) { case MESSAGE_MAGIC_CODE: break; case BLANK_MAGIC_CODE: return null; default: // log.warning("found a illegal magic code 0x" + Integer.toHexString(magicCode)); return null; } byte[] bytesContent = new byte[totalSize]; // 3 BODY int bodyLen = byteBuffer.getInt(); byte[] body = new byte[bodyLen]; if (bodyLen > 0) { // Read and verify message body content byteBuffer.get(body, 0, bodyLen); } // 4 HEADERS short headersLength = byteBuffer.getShort(); KeyValue headers = null; if (headersLength > 0) { byteBuffer.get(bytesContent, 0, headersLength); String headersStr = new String(bytesContent, 0, headersLength, EmbeddedMessageDecoder.CHARSET_UTF8); headers = string2KeyValue(headersStr); } // 5 PROPERTIES // Get properties size short propertiesLength = byteBuffer.getShort(); KeyValue properties = null; if (propertiesLength > 0) { byteBuffer.get(bytesContent, 0, propertiesLength); String propertiesStr = new String(bytesContent, 0, propertiesLength, EmbeddedMessageDecoder.CHARSET_UTF8); properties = string2KeyValue(propertiesStr); } // Returns the read message return new DefaultBytesMessage( totalSize, headers, properties, body ); }
Message Writing
The writing of messages in Embedded Message Queue is actually accomplished by BucketQueue's putMessage/putMessages function, where a BucketQueue corresponds to the unique identity of Topic-queueId. Let's take batch writing as an example. First, we get the latest available MappedPartition from Mapped Partition Queue included in BucketQueue:
mappedPartition = this.mappedPartitionQueue.getLastMappedFileOrCreate(0);
Then call MappedPartition's appendMessages method, which is described below; here we discuss the corresponding processing of several results of adding messages. If the addition is successful, the success is returned directly; if the remaining space of the MappedPartition is insufficient to write a message in the message queue, the MappedPartitionQueue needs to be called to create a new MappedPartition and recalculate the sequence of messages to be written:
... // Call the corresponding MappedPartition additional message // Note that after filling in here, the migration of the message in Message Store and in QueueOffset are added in reverse. result = mappedPartition.appendMessages(messages, this.appendMessageCallback); // Different operations are performed according to the additional results switch (result.getStatus()) { case PUT_OK: break; case END_OF_FILE: this.messageQueue.getFlushAndUnmapPartitionService().putPartition(mappedPartition); // If you have reached the end of the file, create a new file mappedPartition = this.mappedPartitionQueue.getLastMappedFileOrCreate(0); if (null == mappedPartition) { // XXX: warn and notify me log.warning("Establish MappedPartition error, topic: " + messages.get(0).getTopicOrQueueName()); beginTimeInLock = 0; return new PutMessageResult(PutMessageStatus.CREATE_MAPEDFILE_FAILED, result); } // Otherwise, add again // Get the number of processed messages from the results int appendedMessageNum = result.getAppendedMessageNum(); // Create temporary eftMessages ArrayList<DefaultBytesMessage> leftMessages = new ArrayList<>(); // Add all unconsumed messages for (int i = appendedMessageNum; i < messages.size(); i++) { leftMessages.add(messages.get(i)); } result = mappedPartition.appendMessages(leftMessages, this.appendMessageCallback); break; case MESSAGE_SIZE_EXCEEDED: case PROPERTIES_SIZE_EXCEEDED: beginTimeInLock = 0; return new PutMessageResult(PutMessageStatus.MESSAGE_ILLEGAL, result); case UNKNOWN_ERROR: beginTimeInLock = 0; return new PutMessageResult(PutMessageStatus.UNKNOWN_ERROR, result); default: beginTimeInLock = 0; return new PutMessageResult(PutMessageStatus.UNKNOWN_ERROR, result); } ...
Logical File Storage
Mapped Partition
A MappedPartition maps a single file physically, which is initialized with the following file name and file size attributes:
/** * Description Initialize a memory mapping file * * @param fileName file name * @param fileSize File size * @throws IOException An exception occurred when opening the file */ private void init(final String fileName, final int fileSize) throws IOException { ... // Getting the global offset of the current file from the file name this.fileFromOffset = Long.parseLong(this.file.getName()); ... // Try to open the file this.fileChannel = new RandomAccessFile(this.file, "rw").getChannel(); // Mapping files to memory this.mappedByteBuffer = this.fileChannel.map(FileChannel.MapMode.READ_WRITE, 0, fileSize); }
The initialization stage opens the file mapping, and then when writing messages or other content, it calls the incoming message encoding callback (that is, the package object serialized by the message described above) to encode the object as a byte stream and write:
public AppendMessageResult appendMessage(final DefaultBytesMessage message, final AppendMessageCallback cb) { ... // Get the current write location int currentPos = this.wrotePosition.get(); // If it's still writable if (currentPos < this.fileSize) { // Get the actual write handle ByteBuffer byteBuffer = this.mappedByteBuffer.slice(); // Adjust the current write location byteBuffer.position(currentPos); // Recording information AppendMessageResult result = null; // Calling the actual write operation in the callback function result = cb.doAppend(this.getFileFromOffset(), byteBuffer, this.fileSize - currentPos, message); this.wrotePosition.addAndGet(result.getWroteBytes()); this.storeTimestamp = result.getStoreTimestamp(); return result; } ... }
MappedPartitionQueue
MappedPartitionQueue is used to manage multiple physical mapping files. Its constructors are as follows:
// Store all mapping files private final CopyOnWriteArrayList<MappedPartition> mappedPartitions = new CopyOnWriteArrayList<MappedPartition>(); ... /** * Description Default constructor * * @param storePath Input storage file directory, possibly into MessageStore directory or ConsumeQueue directory * @param mappedFileSize * @param allocateMappedPartitionService */ public MappedPartitionQueue(final String storePath, int mappedFileSize, AllocateMappedPartitionService allocateMappedPartitionService) { this.storePath = storePath; this.mappedFileSize = mappedFileSize; this.allocateMappedPartitionService = allocateMappedPartitionService; }{}
The load function is taken as an example to illustrate its loading process.
/** * Description Loading Memory Mapped File Sequences * * @return */ public boolean load() { // Read Storage Path File dir = new File(this.storePath); // List all files in the catalog File[] files = dir.listFiles(); // If the file is not empty, then it is necessary to load it if (files != null) { // Reordering Arrays.sort(files); // Traverse all files for (File file : files) { // If you encounter a file that has not been filled, you return to the end of loading. if (file.length() != this.mappedFileSize) { log.warning(file + "\t" + file.length() + " length not matched message store config value, ignore it"); return true; } // Otherwise, load the file try { // Actual reading of files MappedPartition mappedPartition = new MappedPartition(file.getPath(), mappedFileSize); // Set the current file pointer to the end of the file mappedPartition.setWrotePosition(this.mappedFileSize); mappedPartition.setFlushedPosition(this.mappedFileSize); // Place the file in the MappedFiles array this.mappedPartitions.add(mappedPartition); // log.info("load " + file.getPath() + " OK"); } catch (IOException e) { log.warning("load file " + file + " error"); return false; } } } return true; }
Asynchronous Pre-Creation File
For performance reasons, MappedPartitionQueue also creates files in advance, and in the getLastMappedFileOrCreate function, when allocateMappedPartitionService exists, the asynchronous service pre-created file is invoked:
/** * Description Find the last file based on the initial offset * * @param startOffset * @return */ public MappedPartition getLastMappedFileOrCreate(final long startOffset) { ... // Create files if necessary if (createOffset != -1) { // Get the path and file name to the next file String nextFilePath = this.storePath + File.separator + FSExtra.offset2FileName(createOffset); // And the path and file name of the next file String nextNextFilePath = this.storePath + File.separator + FSExtra.offset2FileName(createOffset + this.mappedFileSize); // Point to the mapping file handle to be created MappedPartition mappedPartition = null; // Determine whether there is a service for creating mapping files if (this.allocateMappedPartitionService != null) { // Using services to create mappedPartition = this.allocateMappedPartitionService.putRequestAndReturnMappedFile(nextFilePath, nextNextFilePath, this.mappedFileSize); // Preheat treatment } else { // Otherwise, create it directly try { mappedPartition = new MappedPartition(nextFilePath, this.mappedFileSize); } catch (IOException e) { log.warning("create mappedPartition exception"); } } ... return mappedPartition; } return mappedPartitionLast; }
Here the Allocate MappedPartition Service executes requests to create files uninterruptedly:
@Override public void run() { ... // Loop execution of file allocation requests while (!this.isStopped() && this.mmapOperation()) {} ... } /** * Description Loop execution mapping file pre-allocation * * @Exception Only interrupted by the external thread, will return false */ private boolean mmapOperation() { ... // Perform operations try { // Remove the latest execution object req = this.requestQueue.take(); // Get an instance of the object to be executed in the request table AllocateRequest expectedRequest = this.requestTable.get(req.getFilePath()); ... // Determine whether the created object already exists if (req.getMappedPartition() == null) { // Record start creation time long beginTime = System.currentTimeMillis(); // Building Memory Mapped File Objects MappedPartition mappedPartition = new MappedPartition(req.getFilePath(), req.getFileSize()); ... // Preheat files, only MessageStore if (mappedPartition.getFileSize() >= mapedFileSizeCommitLog && isWarmMappedFileEnable) { mappedPartition.warmMappedFile(); } // Write back the created object to the request req.setMappedPartition(mappedPartition); // Exception set to false this.hasException = false; // Successfully set to true isSuccess = true; } ... }
Asynchronous Flush
Embedded MessageQueue also contains a flush AndUnmapPartition Services for asynchronous Flush files and for closing operations without mapping files. The core code of the service is as follows:
private final ConcurrentLinkedQueue<MappedPartition> mappedPartitions = new ConcurrentLinkedQueue<>(); ... @Override public void run() { while (!this.isStopped()) { int interval = 100; try { if (this.mappedPartitions.size() > 0) { long startTime = now(); // Remove the MappedPartition to be processed MappedPartition mappedPartition = this.mappedPartitions.poll(); // Write the current content to disk mappedPartition.flush(0); // Release currently unnecessary space mappedPartition.cleanup(); long past = now() - startTime; // EmbeddedProducer.flushEclipseTime.addAndGet(past); if (past > 500) { log.info("Flush data to disk and unmap MappedPartition costs " + past + " ms:" + mappedPartition.getFileName()); } } else { // Perform Flush operations on a regular basis this.waitForRunning(interval); } } catch (Throwable e) { log.warning(this.getServiceName() + " service has exception. "); } } }
Here, mapped Partitions are added when the message is added and returned to END_OF_FILE as described above.
LocalMessageQueue
Source code reference Here
Message Storage
In LocalMessageQueue, a centralized message storage scheme is adopted. The putMessage / putMessages function provided by it actually calls the message writing function of the built-in MessageStore object:
// Submit using MessageStore PutMessageResult result = this.messageStore.putMessage(message);
Message Store is the central storage for all real messages. Local Message Queue supports more complex message attributes:
Serial number | Message Storage Structure | Remarks | Length (bytes) |
---|---|---|---|
1 | TOTALSIZE | Message size | 4 |
2 | MAGICCODE | MAGIC CODE for Messages | 4 |
3 | BODYCRC | Message Body BODY CRC for restart-time verification | 4 |
4 | QUEUEID | Queue ID | 4 |
5 | QUEUEOFFSET | Self-increment, not the real offset of consume queue, can represent the number of messages in this queue. To find the data in consume queue through this value, QUEUEOFFSET* 12 is the offset address. | 8 |
6 | PHYSICALOFFSET | Physical Start Address Offset of Messages in commitLog | 8 |
7 | STORETIMESTAMP | Storage timestamp | 8 |
8 | BODY | The first four bytes store the message body size value, and the second body Length size space stores the message body content. | 4 + bodyLength |
9 | TOPICORQUEUENAME | The first byte stores the Topic size, followed by the topic name of the Topic OrQueueName Length size. | 1 + topicOrQueueNameLength |
10 | headers* | The first two bytes (short) store the header size, and then the header Length size header data. | 2 + headersLength |
11 | properties* | The first two bytes (short) store the attribute value size, and then the attribute data of the property Length size. | 2 + propertiesLength |
The Mapped Partition Queue initialized in its constructor is a group of mapped files of fixed size (default single file 1G):
// Constructing Mapping File Classes this.mappedPartitionQueue = new MappedPartitionQueue( ((LocalMessageQueueConfig) this.messageStore.getMessageQueueConfig()).getStorePathCommitLog(), mapedFileSizeCommitLog, messageStore.getAllocateMappedPartitionService(), this.flushMessageStoreService );
Building ConsumeQueue
Unlike Embedded Message Queue, Local Message Queue does not write directly to the storage divided by Topic-queueId when the message is first submitted; instead, it relies on the built-in PostPutMessage Service:
/** * Description Post-message operation */ private void doReput() { for (boolean doNext = true; this.isCommitLogAvailable() && doNext; ) { ... // Read the current message SelectMappedBufferResult result = this.messageStore.getMessageStore().getData(reputFromOffset); // Stop the current operation if the message does not exist if (result == null) { doNext = false; continue; } try { // Get the starting position of the current message this.reputFromOffset = result.getStartOffset(); // Read all messages sequentially for (int readSize = 0; readSize < result.getSize() && doNext; ) { // Read the message of the current location PostPutMessageRequest postPutMessageRequest = checkMessageAndReturnSize(result.getByteBuffer()); int size = postPutMessageRequest.getMsgSize(); readSpendTime.addAndGet(now() - startTime); startTime = now(); // If successful if (postPutMessageRequest.isSuccess()) { if (size > 0) { // Execute the operation of writing messages to ConsumeQueue this.messageStore.putMessagePositionInfo(postPutMessageRequest); // Fixed the current read position this.reputFromOffset += size; readSize += size; } else if (size == 0) { this.reputFromOffset = this.messageStore.getMessageStore().rollNextFile(this.reputFromOffset); readSize = result.getSize(); } putSpendTime.addAndGet(now() - startTime); } else if (!postPutMessageRequest.isSuccess()) { ... } } } finally { result.release(); } } }
In the putMessagePositionInfo function, the actual ConstumeQueue is created:
/** * Description Place the location of the message in ConsumeQueue * * @param postPutMessageRequest */ public void putMessagePositionInfo(PostPutMessageRequest postPutMessageRequest) { // Find or create ConsumeQueue ConsumeQueue cq = this.findConsumeQueue(postPutMessageRequest.getTopic(), postPutMessageRequest.getQueueId()); // Place messages in the right place in ConsumeQueue cq.putMessagePositionInfoWrapper(postPutMessageRequest.getCommitLogOffset(), postPutMessageRequest.getMsgSize(), postPutMessageRequest.getConsumeQueueOffset()); } /** * Description Find ConsumeQueue by topic and QueueId, and create it if it does not exist * * @param topic * @param queueId * @return */ public ConsumeQueue findConsumeQueue(String topic, int queueId) { ConcurrentHashMap<Integer, ConsumeQueue> map = consumeQueueTable.get(topic); ... // Determine if queueId exists under this topic, and if it does not exist, create ConsumeQueue logic = map.get(queueId); // If the fetch is empty, create a new ConsumeQueue if (null == logic) { ConsumeQueue newLogic = new ConsumeQueue(// topic, // theme queueId, // queueId LocalMessageQueueConfig.mapedFileSizeConsumeQueue, // Mapping file size this); ConsumeQueue oldLogic = map.putIfAbsent(queueId, newLogic); ... } return logic; }
In the constructor of ConsumeQueue, the actual file mapping and reading are completed:
/** * Description Major constructors * * @param topic * @param queueId * @param mappedFileSize * @param localMessageStore */ public ConsumeQueue( final String topic, final int queueId, final int mappedFileSize, final LocalMessageQueue localMessageStore) { ... // The path of the current queue String queueDir = this.storePath + File.separator + topic + File.separator + queueId; // Initialize memory mapping queues this.mappedPartitionQueue = new MappedPartitionQueue(queueDir, mappedFileSize, null); this.byteBufferIndex = ByteBuffer.allocate(CQ_STORE_UNIT_SIZE); }
The file format of ConsumeQueue is relatively simple:
// Single Message Size in ConsumeQueue File // 1 | MessageStore Offset | int 8 Byte // 2 | Size | short 8 Byte
Message pulling
When Local Pull Consumer pulls out interest, a batch pull mechanism is set up, that is, pulling multiple messages from Local Message Queue to the local area at one time, and then returning them to the local area in batches for processing (assuming that the processing is time-consuming). In the batch pull function, we first need to obtain whether the ConsumeQueue corresponding to the current Consumer topic and queue number contains data, and then apply for a specific read handle and occupy the queue:
/** * Description Batch grab cancel rate. Note that only pre-grab is done here, and the read offset will be corrected only when the consumer actually gets it. */ private void batchPoll() { // If it's LocalMessageQueue // Execute Prefetch LocalMessageQueue localMessageStore = (LocalMessageQueue) this.messageQueue; // Get the name of the bucket currently to be grabbed String bucket = bucketList.get((lastIndex) % (bucketList.size())); // First, get the queue and offset to be captured long offsetInQueue = localMessageStore.getConsumerScheduler().queryOffsetAndLock("127.0.0.1:" + this.refId, bucket, this.getQueueId()); // If the queueId currently being crawled is already occupied, switch directly to the next topic if (offsetInQueue == -2) { // Set the current theme to true this.isFinishedTable.put(bucket, true); // Reset the current LastIndex or RefOffset, queueId this.resetLastIndexOrRefOffsetWhenNotFound(); } else { // After obtaining a valid queue offset, start trying to retrieve the message consumerOffsetTable.put(bucket, new AtomicLong(offsetInQueue)); // Set the maximum number of messages contained in a file at a time, which is equivalent to one-time read in disguise. Note that the number here is also limited by the size of a single file. GetMessageResult getMessageResult = localMessageStore.getMessage(bucket, this.getQueueId(), this.consumerOffsetTable.get(bucket).get() + 1, mapedFileSizeConsumeQueue / ConsumeQueue.CQ_STORE_UNIT_SIZE); // If no data is found, switch to the next one if (getMessageResult.getStatus() != GetMessageStatus.FOUND) { // Set the current theme to true this.isFinishedTable.put(bucket, true); this.resetLastIndexOrRefOffsetWhenNotFound(); } else { // This does not take into account the malicious killing of Consumer, so updates the remote Offset value directly. localMessageStore.getConsumerScheduler().updateOffset("127.0.0.1:" + this.refId, bucket, this.getQueueId(), consumerOffsetTable.get(bucket).addAndGet(getMessageResult.getMessageCount())); // First read all the messages from the file system at once ArrayList<DefaultBytesMessage> messages = readMessagesFromGetMessageResult(getMessageResult); // Adding messages to the queue this.messages.addAll(messages); // Only after this grab is successful can we start to grab the next one. lastIndex++; } } }
Consumer scheduling
Consumer Scheduler provides us with a core consumer scheduling function. Its built-in Customer Offset Manager contains two core stores:
// Storage maps to memory private ConcurrentHashMap<String/* topic */, ConcurrentHashMap<Integer/*queueId*/, Long>> offsetTable = new ConcurrentHashMap<String, ConcurrentHashMap<Integer, Long>>(512); // Store information about a Queue under a Topic occupied by a Consumer private ConcurrentHashMap<String/* topic */, ConcurrentHashMap<Integer/*queueId*/, String/*refId*/>> queueIdOccupiedByConsumerTable = new ConcurrentHashMap<String, ConcurrentHashMap<Integer, String>>(512);
It corresponds to the progress of consumption of a Consume Queue and the information occupied by consumers. At the same time, Consumer Offset Manager also provides JSON-based persistence function, and automatic periodic persistence through scheduled Executor Service in Consumer Scheduler. In the message submission phase, the LocalMessageQueue automatically calls the updateOffset function to initialize the offset of a ConsumeQueue (also used in recovery):
public void updateOffset(final String topic, final int queueId, final long offset) { this.consumerOffsetManager.commitOffset("Broker Inner", topic, queueId, offset); }
When a Consumer first pulls, it calls the queryOffsetAndLock function to query the pullability of a ConsumeQueue:
/** * Description Fixed a value in a ConsumerOffset queue * * @param topic * @param queueId * @return */ public long queryOffsetAndLock(final String clientHostAndPort, final String topic, final int queueId) { String key = topic; // First, determine whether the Topic-queueId is occupied if (this.queueIdOccupiedByConsumerTable.containsKey(topic)) { ... } // If it is not occupied, the occupancy is declared at this time ConcurrentHashMap<Integer, String> consumerQueueIdMap = this.queueIdOccupiedByConsumerTable.get(key); ... // Real search operation ConcurrentHashMap<Integer, Long> map = this.offsetTable.get(key); if (null != map) { Long offset = map.get(queueId); if (offset != null) return offset; } // The default return value is -1 return -1; }
And call the updateOffset function to update the pull progress after the pull is finished.
Message Reading
After a Consumer gets the available pull offset through ConsumerManager, it reads the real message from LocalMessageQueue:
/** * Description Consumer Interface for reading data from storage * * @param topic * @param queueId * @param offset Start Subscript for Next Start Grabbing * @param maxMsgNums * @return */ public GetMessageResult getMessage(final String topic, final int queueId, final long offset, final int maxMsgNums) { ... // Build a consumer queue based on Topic and queueId ConsumeQueue consumeQueue = findConsumeQueue(topic, queueId); // Ensure the current ConsumeQueue exists if (consumeQueue != null) { // Get the displacement of the smallest message contained in the current ConsumeQueue in the MessageStore minOffset = consumeQueue.getMinOffsetInQueue(); // Note that the largest displacement address is the unreachable address, which is the subscript of the next message for all current messages maxOffset = consumeQueue.getMaxOffsetInQueue(); // If maxOffset is zero, no message is available if (maxOffset == 0) { status = GetMessageStatus.NO_MESSAGE_IN_QUEUE; nextBeginOffset = 0; } else if (offset < minOffset) { status = GetMessageStatus.OFFSET_TOO_SMALL; nextBeginOffset = minOffset; } else if (offset == maxOffset) { status = GetMessageStatus.OFFSET_OVERFLOW_ONE; nextBeginOffset = offset; } else if (offset > maxOffset) { status = GetMessageStatus.OFFSET_OVERFLOW_BADLY; if (0 == minOffset) { nextBeginOffset = minOffset; } else { nextBeginOffset = maxOffset; } } else { // Get the current ConsumeQueue cache based on the offset SelectMappedBufferResult bufferConsumeQueue = consumeQueue.getIndexBuffer(offset); if (bufferConsumeQueue != null) { try { status = GetMessageStatus.NO_MATCHED_MESSAGE; long nextPhyFileStartOffset = Long.MIN_VALUE; long maxPhyOffsetPulling = 0; int i = 0; // Set the maximum number of messages retrieved at a time final int maxFilterMessageCount = Math.max(16000, maxMsgNums * ConsumeQueue.CQ_STORE_UNIT_SIZE); // Traversing through message pointers in all Consume Queue s for (; i < bufferConsumeQueue.getSize() && i < maxFilterMessageCount; i += ConsumeQueue.CQ_STORE_UNIT_SIZE) { long offsetPy = bufferConsumeQueue.getByteBuffer().getLong(); int sizePy = bufferConsumeQueue.getByteBuffer().getInt(); maxPhyOffsetPulling = offsetPy; if (nextPhyFileStartOffset != Long.MIN_VALUE) { if (offsetPy < nextPhyFileStartOffset) continue; } boolean isInDisk = checkInDiskByCommitOffset(offsetPy, maxOffsetPy); if (isTheBatchFull(sizePy, maxMsgNums, getResult.getBufferTotalSize(), getResult.getMessageCount(), isInDisk)) { break; } // Getting messages from MessageStore SelectMappedBufferResult selectResult = this.messageStore.getMessage(offsetPy, sizePy); // If no data is obtained, switch to the next file to continue if (null == selectResult) { if (getResult.getBufferTotalSize() == 0) { status = GetMessageStatus.MESSAGE_WAS_REMOVING; } nextPhyFileStartOffset = this.messageStore.rollNextFile(offsetPy); continue; } // If obtained, the result is returned getResult.addMessage(selectResult); status = GetMessageStatus.FOUND; nextPhyFileStartOffset = Long.MIN_VALUE; } nextBeginOffset = offset + (i / ConsumeQueue.CQ_STORE_UNIT_SIZE); long diff = maxOffsetPy - maxPhyOffsetPulling; // Get the current memory condition long memory = (long) (getTotalPhysicalMemorySize() * (LocalMessageQueueConfig.accessMessageInMemoryMaxRatio / 100.0)); getResult.setSuggestPullingFromSlave(diff > memory); } finally { bufferConsumeQueue.release(); } } else { status = GetMessageStatus.OFFSET_FOUND_NULL; nextBeginOffset = consumeQueue.rollNextFile(offset); log.warning("consumer request topic: " + topic + "offset: " + offset + " minOffset: " + minOffset + " maxOffset: " + maxOffset + ", but access logic queue failed."); } } } else { ... } ... }
Note that the only message returned here is the storage address of the message in the Message Store. To read the real message, you need to use the readMessagesFromGetMessageResult function:
/** * Description Grab all messages from GetMessageResult * * @param getMessageResult * @return */ public static ArrayList<DefaultBytesMessage> readMessagesFromGetMessageResult(final GetMessageResult getMessageResult) { ArrayList<DefaultBytesMessage> messages = new ArrayList<>(); try { List<ByteBuffer> messageBufferList = getMessageResult.getMessageBufferList(); for (ByteBuffer bb : messageBufferList) { messages.add(readMessageFromByteBuffer(bb)); } } finally { getMessageResult.release(); } // Get byte arrays return messages; } /** * Description Deserialize message objects from input ByteBuffer * * @return 0 Come the end of the file // >0 Normal messages // -1 Message checksum failure */ public static DefaultBytesMessage readMessageFromByteBuffer(java.nio.ByteBuffer byteBuffer) { // 1 TOTAL SIZE int totalSize = byteBuffer.getInt(); // 2 MAGIC CODE int magicCode = byteBuffer.getInt(); switch (magicCode) { case MESSAGE_MAGIC_CODE: break; case BLANK_MAGIC_CODE: return null; default: log.warning("found a illegal magic code 0x" + Integer.toHexString(magicCode)); return null; } byte[] bytesContent = new byte[totalSize]; ... }
Epilogue
Around the Dragon Boat Festival, I stopped coding and thought that Zhou could finish the document writing in time. Unfortunately, the graduation trip and graduation party had been delayed until July, and finally finished in a hurry. It was also my personal delay in the advanced stage of cancer.