Dubbo codec stuff

Posted by Madzz on Tue, 08 Mar 2022 04:28:54 +0100

1, Background

In the process of maintaining basic public components, the author accidentally modified the package path of the class. Unfortunately, this class is referenced and passed by various businesses in the facade. Fortunately, for the same class, the package paths of providers and consumers are inconsistent, which does not cause errors in various businesses.

With curiosity, I have done several debugging studies on Dubbo's codec, and share some learning experience here.

1.1 love and hate of RPC

As an RPC framework of Java language, Dubbo has one of the advantages that it shields the call details and can call remote services like calling local methods, without having to pay attention to the data format. It is this feature that also introduces some problems.

For example, after the facade package is introduced, there are jar package conflicts, the service cannot be started, and a class cannot be found after updating the facade package. The introduction of jar package leads to a certain degree of coupling between the consumer and the provider.

It is this coupling that, after the provider modifies the path of the Facade package class, it is habitually thought that it will cause an error, but in fact it does not. At first, I thought it was strange. After careful thinking, I thought it should be like this. The caller can complete communication with the provider on the basis of the agreed format and protocol. You should not focus on the provider's own context information. (think that the path of the class belongs to context information) next, uncover Dubbo's encoding and decoding process.

2, Dubbo codec

Dubbo uses netty as the communication framework by default, and all analysis is based on netty. The source code involved is Dubbo - 2.7 X version. In the actual process, a service is likely to be both a consumer and a provider. In order to simplify the sorting process, it is assumed that they are pure consumers and providers.

2.1 In Dubbo

Borrowing a figure from Dubbo's official document, the document defines the communication and serialization layer, and does not define the meaning of "encoding and decoding". Here, the "encoding and decoding" is briefly explained.

Codec = dubbo internal codec link + serialization layer

This paper aims to sort out the conversion between the two data formats from Java object to binary stream and binary stream to Java object. For this purpose, in order to facilitate understanding, the content of the communication layer is added, and the Dubbo processing link is combed with encode and decode as the entry. Since Dubbo is internally defined as Encoder and Decoder, it is defined here as "codec".

Both serialization layer and communication layer are the cornerstone of Dubbo's efficient and stable operation. Understanding the underlying implementation logic can help us better learn and use Dubbo framework.

2.2 entrance

The consumer interface initiates a connection in the NettyClient#doOpen method. When initializing BootStrap, different types of channelhandlers will be added to the Netty pipeline, including codecs.

Similarly, the provider provides services in the NettyServer#doOpen method. When initializing ServerBootstrap, the codec will be added. (adapter.getDecoder () - decoder, adapter Getencoder () - encoder).

NettyClient

       /**
 * Init bootstrap
 *
 * @throws Throwable
 */
@Override
protected void doOpen() throws Throwable {
    bootstrap = new Bootstrap();
    // ...
    bootstrap.handler(new ChannelInitializer<SocketChannel>() {
 
        @Override
        protected void initChannel(SocketChannel ch) throws Exception {
            // ...
            ch.pipeline()
                    .addLast("decoder", adapter.getDecoder())
                    .addLast("encoder", adapter.getEncoder())
                    .addLast("client-idle-handler", new IdleStateHandler(heartbeatInterval, 0, 0, MILLISECONDS))
                    .addLast("handler", nettyClientHandler);
            // ...
        }
    });
}

NettyServer

      /**
 * Init and start netty server
 *
 * @throws Throwable
 */
@Override
protected void doOpen() throws Throwable {
    bootstrap = new ServerBootstrap();
    // ...
 
    bootstrap.group(bossGroup, workerGroup)
            .channel(NettyEventLoopFactory.serverSocketChannelClass())
            .option(ChannelOption.SO_REUSEADDR, Boolean.TRUE)
            .childOption(ChannelOption.TCP_NODELAY, Boolean.TRUE)
            .childOption(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT)
            .childHandler(new ChannelInitializer<SocketChannel>() {
                @Override
                protected void initChannel(SocketChannel ch) throws Exception {
                    // ...
                    ch.pipeline()
                            .addLast("decoder", adapter.getDecoder())
                            .addLast("encoder", adapter.getEncoder())
                            .addLast("server-idle-handler", new IdleStateHandler(0, 0, idleTimeout, MILLISECONDS))
                            .addLast("handler", nettyServerHandler);
                }
            });
    // ...
}

2.3 consumer link

Consumers encode when sending messages and decode when receiving responses.

send message

ChannelInboundHandler
...
NettyCodecAdapter#getEncoder()
    ->NettyCodecAdapter$InternalEncoder#encode
         ->DubboCountCodec#encode
             ->DubboCodec#encode
                ->ExchangeCodec#encode
                ->ExchangeCodec#encodeRequest
 
DubboCountCodec Class actually refers to DubboCodec,because DubboCodec Inherit from ExchangeCodec,Not rewritten encode Method, so the actual code jumps directly into ExchangeCodec#encode method

Receive response

NettyCodecAdapter#getDecoder()
    ->NettyCodecAdapter$InternalDecoder#decode
         ->DubboCountCodec#decode
             ->DubboCodec#decode
                 ->ExchangeCodec#decode
             ->DubboCodec#decodeBody
...
MultiMessageHandler#received
    ->HeartbeatHadnler#received
        ->AllChannelHandler#received
...
ChannelEventRunnable#run
    ->DecodeHandler#received
    ->DecodeHandler#decode
        ->DecodeableRpcResult#decode
 
The decoding link is relatively complex. In the process, the decoding is done twice and once DubboCodec#In the decodeBody, the data of the channel is not actually decoded, but is constructed into a decodablerpcresult object, which is actually decoded through asynchronous threads in the Handler of business processing.

2.4 provide end link

The provider decodes when receiving the message and encodes when replying to the response.

receive messages

NettyCodecAdapter#getDecoder()
    ->NettyCodecAdapter$InternalDecoder#decode
         ->DubboCountCodec#decode
             ->DubboCodec#decode
                 ->ExchangeCodec#decode
             ->DubboCodec#decodeBody
...
MultiMessageHandler#received
    ->HeartbeatHadnler#received
        ->AllChannelHandler#received
...
ChannelEventRunnable#run
    ->DecodeHandler#received
    ->DecodeHandler#decode
        ->DecodeableRpcInvocation#decode
 
The decoding link of the provider is similar to that of the consumer, except that the actual decoding object is different, DecodeableRpcResult replace with DecodeableRpcInvocation. 
 
Reflects Dubbo Good design in the code, abstract processing links, shielding processing details, clear and reusable process.

Reply response

NettyCodecAdapter#getEncoder()
    ->NettyCodecAdapter$InternalEncoder#encode
         ->DubboCountCodec#encode
             ->DubboCodec#encode
                ->ExchangeCodec#encode
                ->ExchangeCodec#encodeResponse
 
It is consistent with the message link sent by the consumer. The difference lies in the last step Request and Response,Encode different contents

2.5 Dubbo protocol header

Dubbo supports a variety of communication protocols, such as Dubbo protocol, http, rmi, webservice and so on. The default is Dubbo protocol. As a communication protocol, there are certain protocol formats and conventions, and these information is not concerned by the business. In the process of adding and Dubbo, the framework is in the process of parsing.

dubbo uses fixed length message header + variable length message body for data transmission. The following is the format definition of the message header

**2byte: * * magic, similar to the magic number in java bytecode file, is used to identify whether it is a data packet of dubbo protocol.

**1byte: * * message flag bit, 5-bit serialization id, 1-bit heartbeat or normal request, 1-bit bidirectional or unidirectional, 1-bit request or response;

**1byte: * * response status, see com.com for specific types alibaba. dubbo. remoting. exchange. Response;

**8byte: * * message ID, the unique identification ID of each request;

**4byte: * * message body length.

Take the message sent by the consumer as an example. See ExchangeCodec#encodeRequest for the code to set the content of the message header.

Message encoding

protected void encodeRequest(Channel channel, ChannelBuffer buffer, Request req) throws IOException {
        Serialization serialization = getSerialization(channel);
        // header.
        byte[] header = new byte[HEADER_LENGTH];
        // set magic number.
        Bytes.short2bytes(MAGIC, header);
 
        // set request and serialization flag.
        header[2] = (byte) (FLAG_REQUEST | serialization.getContentTypeId());
 
        if (req.isTwoWay()) {
            header[2] |= FLAG_TWOWAY;
        }
        if (req.isEvent()) {
            header[2] |= FLAG_EVENT;
        }
 
        // set request id.
        Bytes.long2bytes(req.getId(), header, 4);
 
        // encode request data.
        int savedWriteIndex = buffer.writerIndex();
        buffer.writerIndex(savedWriteIndex + HEADER_LENGTH);
        ChannelBufferOutputStream bos = new ChannelBufferOutputStream(buffer);
        ObjectOutput out = serialization.serialize(channel.getUrl(), bos);
        if (req.isEvent()) {
            encodeEventData(channel, out, req.getData());
        } else {
            encodeRequestData(channel, out, req.getData(), req.getVersion());
        }
        out.flushBuffer();
        if (out instanceof Cleanable) {
            ((Cleanable) out).cleanup();
        }
        bos.flush();
        bos.close();
        int len = bos.writtenBytes();
        checkPayload(channel, len);
        // body length
        Bytes.int2bytes(len, header, 12);
 
        // write
        buffer.writerIndex(savedWriteIndex);
        buffer.writeBytes(header); // write header.
        buffer.writerIndex(savedWriteIndex + HEADER_LENGTH + len);
    }

3, Hessian 2

The previous section combed the encoding and decoding process. This section takes a closer look at the details of object serialization.

As we know, Dubbo supports a variety of serialization formats, such as hessian2, json, jdk serialization, etc. Hessian 2 is modified by Alibaba. It is also the default serialization framework of Dubbo. Here, take the case that the consumer sends the message serialization object and receives the response deserialization as an example to see the processing details of Hessian 2 and answer the preface question at the same time.

3.1 serialization

As mentioned earlier, the request encoding method is ExchangeCodec#encodeRequest, in which the object data is serialized as DubboCodec#encodeRequestData

DubboCodec

@Override
protected void encodeRequestData(Channel channel, ObjectOutput out, Object data, String version) throws IOException {
    RpcInvocation inv = (RpcInvocation) data;
 
    out.writeUTF(version);
    // https://github.com/apache/dubbo/issues/6138
    String serviceName = inv.getAttachment(INTERFACE_KEY);
    if (serviceName == null) {
        serviceName = inv.getAttachment(PATH_KEY);
    }
    out.writeUTF(serviceName);
    out.writeUTF(inv.getAttachment(VERSION_KEY));
 
    out.writeUTF(inv.getMethodName());
    out.writeUTF(inv.getParameterTypesDesc());
    Object[] args = inv.getArguments();
    if (args != null) {
        for (int i = 0; i < args.length; i++) {
            out.writeObject(encodeInvocationArgument(channel, inv, i));
        }
    }
    out.writeAttachments(inv.getObjectAttachments());
}

We know that during dubbo calling, Invocation is used as the context storage. Here, the version number, service name, method name, method parameters, return value and other information are written first. Then loop through the parameter list and serialize each parameter. Here, the out object is the specific serialization framework object, and the default is Hessian2ObjectOutput. This out object is passed in as a parameter.

So where do you confirm the actual serialized object?

Check the encoded call link from the beginning. There are the following codes in ExchangeCodec#encodeRequest:

ExchangeCodec

protected void encodeRequest(Channel channel, ChannelBuffer buffer, Request req) throws IOException {
    Serialization serialization = getSerialization(channel);
    // ...
    ObjectOutput out = serialization.serialize(channel.getUrl(), bos);
    if (req.isEvent()) {
        encodeEventData(channel, out, req.getData());
    } else {
        encodeRequestData(channel, out, req.getData(), req.getVersion());
    }
    // ...
}

The out object comes from the serialization object. Look down. The CodecSupport class has the following codes:

CodecSupport

public static Serialization getSerialization(URL url) {
    return ExtensionLoader.getExtensionLoader(Serialization.class).getExtension(
            url.getParameter(Constants.SERIALIZATION_KEY, Constants.DEFAULT_REMOTING_SERIALIZATION));
}

You can see that the Serialization object is selected based on Dubbo's SPI through the URL information. The default is hessian2. Look at Serialization Serialize (channel. Geturl(), BOS) method:

Hessian2Serialization

@Override
public ObjectOutput serialize(URL url, OutputStream out) throws IOException {
    return new Hessian2ObjectOutput(out);
}

So far, the actual serialization object has been found. The parameter serialization logic is relatively simple and will not be repeated. It is briefly described as follows: write request parameter type → write parameter field name → iterate field list and field serialization.

3.2 deserialization

Deserialization has more constraints than serialization. When serializing objects, you don't need to care about the actual data format of the receiver. Deserialization is not. You need to ensure that the original data matches the object. (the raw data here may be binary stream or json).

It is mentioned in the decoding link of the consumer that there are two times of decoding. For the first time, the service data is not actually decoded, but converted into DecodeableRpcResult. The specific codes are as follows:

DubboCodec

@Override
    protected Object decodeBody(Channel channel, InputStream is, byte[] header) throws IOException {
        byte flag = header[2], proto = (byte) (flag & SERIALIZATION_MASK);
        // get request id.
        long id = Bytes.bytes2long(header, 4);
 
        if ((flag & FLAG_REQUEST) == 0) {
            // decode response...
            try {
                DecodeableRpcResult result;
                if (channel.getUrl().getParameter(DECODE_IN_IO_THREAD_KEY, DEFAULT_DECODE_IN_IO_THREAD)) {
                    result = new DecodeableRpcResult(channel, res, is,
                    (Invocation) getRequestData(id), proto);
                    result.decode();
                } else {
                    result = new DecodeableRpcResult(channel, res,
                    new UnsafeByteArrayInputStream(readMessageData(is)),
                    (Invocation) getRequestData(id), proto);
                }
                data = result;
            } catch (Throwable t) {
                // ...
            }
            return res;
        } else {
            // decode request...
            return req;
        }
    }

Key points

1) The difference between decoding request and decoding response is made. For the consumer, it is decoding response. For the provider, it is the decoding request.

2) Why does decoding occur twice? See this line for details:

if (channel.getUrl().getParameter(DECODE_IN_IO_THREAD_KEY, DEFAULT_DECODE_IN_IO_THREAD)) {
    inv = new DecodeableRpcInvocation(channel, req, is, proto);
    inv.decode();
} else {
    inv = new DecodeableRpcInvocation(channel, req,
    new UnsafeByteArrayInputStream(readMessageData(is)), proto);
}

decode_in_io_thread_key - whether to decode in the IO thread. The default is false. Avoid processing business logic in the IO thread, which is also in line with the recommended practice of netty. That's why there is an asynchronous decoding process.

Look at the code that decodes the business object. Do you remember where it is? DecodeableRpcResult#decode

DecodeableRpcResult

@Override
public Object decode(Channel channel, InputStream input) throws IOException {
 
    ObjectInput in = CodecSupport.getSerialization(channel.getUrl(), serializationType)
            .deserialize(channel.getUrl(), input);
 
    byte flag = in.readByte();
    switch (flag) {
        case DubboCodec.RESPONSE_NULL_VALUE:
            // ...
        case DubboCodec.RESPONSE_VALUE_WITH_ATTACHMENTS:
            handleValue(in);
            handleAttachment(in);
            break;
        case DubboCodec.RESPONSE_WITH_EXCEPTION_WITH_ATTACHMENTS:
            // ...
        default:
            throw new IOException("Unknown result flag, expect '0' '1' '2' '3' '4' '5', but received: " + flag);
    }
    // ...
    return this;
}
 
private void handleValue(ObjectInput in) throws IOException {
    try {
        Type[] returnTypes;
        if (invocation instanceof RpcInvocation) {
            returnTypes = ((RpcInvocation) invocation).getReturnTypes();
        } else {
            returnTypes = RpcUtils.getReturnTypes(invocation);
        }
        Object value = null;
        if (ArrayUtils.isEmpty(returnTypes)) {
            // This almost never happens?
            value = in.readObject();
        } else if (returnTypes.length == 1) {
            value = in.readObject((Class<?>) returnTypes[0]);
        } else {
            value = in.readObject((Class<?>) returnTypes[0], returnTypes[1]);
        }
        setValue(value);
    } catch (ClassNotFoundException e) {
        rethrow(e);
    }
}

ObjectInput appears here. What is the selection logic of the underlying serialization framework? How to keep consistent with the serialization framework on the consumer side?

Each serialization framework has an id, see org apache. dubbo. common. serialize. Constants;

1. When requesting, the serialization framework is selected according to the Url information. The default is hessian2

2. During transmission, the serialization framework ID will be written into the protocol header. See ExchangeCodec#encodeRequest#218 for details

3. When receiving a request from the consumer, the provider will use the corresponding serialization framework according to this id.

The actual holding object this time is Hessian2ObjectInput. Due to the complex logic processing of readObject deserialization, the process is as follows:

4, Frequently asked questions

Question 1: the provider has modified the class path in the Facade. Why is there no error in the deserialization of the consumer?

A: during deserialization, if the consumer cannot find the class path returned by the provider, it will catch an exception, and the local return type shall prevail

Question 2: why is the return value not written when encoding serialization?

A: because in Java, the return value is not used as one of the information identifying the method

Question 3: when will A and B be inconsistent in the deserialization flowchart? Where does A's information come from?

A: when the provider modifies the classpath, a and B will be different; The information of a comes from the Invocation context stored in the Request object when the Request is initiated, which is the return value type in the local jar package.

Question 4: will the consumer report an error when the provider adds or deletes the returned field?

A: No, when deserializing, the intersection of the two fields is taken.

Question 5: if the provider modifies the parent information of the object, will the consumer report an error?

A: No, the transmission only carries the field information of the parent class, not the parent class information. When instantiating, the local class is used for instantiation, and the parent class path of the actual code of the provider is not associated.

Question 6: in the process of deserialization, what happens if there are fields with the same name in the child class and parent class of the returned object, and the child class has a value and the parent class has no value?

A: in Dubbo - 3.0 In version x, the return field will be empty. The reason is that when the coding side iteratively transmits the field set (the consumer side may encode, and the provider side may encode), the field information of the parent class is behind the child class. When the decoding side gets the field set for iterative decoding, it gets the deserializer through the field key. At this time, the subclass and parent have the same name, so the subclass value will be set for the first reflection and the parent value will be set for the second reflection to overwrite.

In Dubbo - 2.7 In version x, this problem has been solved. The solution is also relatively simple. When transmitting on the encoding side, it is through collections Reverse (fields) reverses the field order.

JavaSerializer

public JavaSerializer(Class cl, ClassLoader loader) {
        introspectWriteReplace(cl, loader);
        // ...
        List fields = new ArrayList();
        fields.addAll(primitiveFields);
        fields.addAll(compoundFields);
        Collections.reverse(fields);
        // ...
    }

5, Write at the end

The encoding and decoding process is complex and obscure, and the data types are diverse. The author has encountered and understood a limited number of problems, and combs the encoding and decoding process with the most common and simplest data types. Please forgive me for any mistakes and omissions.

Author: vivo Internet server team Sun wen