1. Start with json
When it comes to serialization, the first thing you think of may be JSON or XML. These two serialization protocols are text-based encoding for data transmission. Similar to YAML.
JSON has many advantages, making it one of the most widely used serialization protocols. JSON protocol is simple, human eye readable, very concise after serialization and fast parsing speed. In addition, JSON has the innate support of JavaScript, is widely used in the application scenario of Web Browser, and is the de facto standard protocol of Ajax.
There are many applicable scenarios for JSON. Typical application scenarios include:
-
Services with relatively small amount of data transmitted between external companies and relatively low real-time requirements.
-
Ajax request based on Web browser.
-
The interface often changes and requires high adjustability, such as the communication between mobile App and server.
However, due to some characteristics of JSON design, using JSON in some scenarios is still not the optimal solution. For example:
-
A standard IDL is needed to enhance the business constraints of all parties involved. Because JSON protocol can only be agreed by document, it may bring some inconvenience and ambiguity to debugging.
-
Scenes with high requirements for performance and simplicity. The serialization and deserialization of JSON in some languages need to adopt reflection mechanism, which may not be the optimal solution in scenarios with particularly high performance requirements.
-
For large data volume services or persistence scenarios. The additional space overhead of JSON serialization is relatively large, which also means large memory and disk overhead.
For the above scenarios, it is more appropriate to use some serialization schemes based on IDL and binary storage, such as ProtoBuf, Thrift, avro, etc.
IDL: all parties involved in the communication need to make relevant agreements on the contents of the communication. In order to establish a language and platform independent convention, this Convention needs to be described in a language independent of the specific development language and platform. This language is called interface description language (IDL), and the protocol written by IDL is called IDL file.
2. What is Protobuf
Protobuffer is the abbreviation of Protocol Buffers. It is a language independent, platform independent and extensible scheme of serializing structured data open source by Google. It can be used for (data) communication protocol, data storage, etc.
ProtoBuf is one of the more suitable serialization schemes in the above scenario. ProtoBuf is very flexible and efficient. We can easily write and read structural data in various languages in various data streams by defining IDL (proto in this case) files and then using "generated source code". You can even update the data structure without breaking the deployed program compiled by the old data structure.
As mentioned above, thrift and Avro are also serialization schemes of the same type. Thrift is not just a serialization protocol, it is embedded in the thrift framework, which makes it difficult to use with other transport layer protocols; Avro is not suitable for the Web environment due to its lack of mature JS implementation, which also leads to its limited use scenarios.
At present, the default serialization method of gRPC is ProtoBuf.
ProtoBuf contains the definition of serialization format, libraries of various languages, and an IDL Compiler. Normally, we need to define the "proto" file, and then compile it into the required language using IDL Compiler.
3. A simple example of "proto"
syntax = "proto3"; // Proto version, proto 3 is recommended option go_package = "main/proto"; // Package name declarator message SearchRequestParam { // message type enum Type { // Enumeration type PC = 0; Mobile = 1; } string query_text = 1; // The "1" after the string type | is a numeric identifier and needs to be unique in the message definition int32 limit = 3; // integer Type type = 4; // Enumeration type } message SearchResultPage { repeated string result = 1; // "Repeated" means that the field can be repeated any number of times (including 0 times) int32 num_results = 2; } // test.proto
There are only some common field definitions in the code, as well as some complex field definitions, such as Oneof, Map, Reserved, etc. you can refer to the official documents.
4. Generate Go code according to # proto # file
After defining the structured data to be processed in the # proto # file, you can use the # protoc # tool to Proto files are converted into codes in C, C + +, Golang, Java, Python and other languages. Let's try to generate Golang language code here.
First, you need to install the {protoc ol} tool.
# Download installation package (Mac) $ wget https://github.com/protocolbuffers/protobuf/releases/download/v3.15.6/protoc-3.15.6-osx-x86_64.zip # Unzip to / usr/local directory $ unzip protoc-3.15.6-osx-x86_64.zip -d protoc-3.15.6-osx-x86_64 $ mv protoc-3.5.0-osx-x86_64/bin/protoc /usr/local/bin/protoc # Successful execution is indicated by: $ protoc --version libprotoc 3.15.6
Then, install a plug-in {protoc Gen go} that officially generates Golang code.
$ go get -u github.com/golang/protobuf/protoc-gen-go
Next, in the directory where the {proto file is located, execute the following command to generate the go file:
$ protoc --go_out=. test.proto
The protoc ol # command can also use the - I parameter to specify the folder where the import # proto # is searched. For details of other parameters, please refer to the official documents.
In the directory where the # proto file is located, we can see a # test pb. Go file. The main structures are as follows:
type SearchRequestParam struct { state protoimpl.MessageState sizeCache protoimpl.SizeCache unknownFields protoimpl.UnknownFields QueryText string `protobuf:"bytes,1,opt,name=query_text..."` Limit int32 `protobuf:"varint,3,opt,name=limit,proto3"...."` Type SearchRequestParam_Type `protobuf:"varint,4,opt,name=type,proto3..."` } type SearchResultPage struct { state protoimpl.MessageState sizeCache protoimpl.SizeCache unknownFields protoimpl.UnknownFields Result []string `protobuf:"bytes,1,rep,name=result,proto3...."` NumResults int32 `protobuf:"varint,2,opt,name=num_results,json=numResults,proto3..."`
Next, it can be used directly in the project code pb.go # type file.
5. What is gogo / protobuf
In the above, we installed a "plug-in for generating golang code, protoc Gen go". This plug-in is actually a Protobuf api implementation officially provided by golang. Our protagonist, gogo/protobuf, is an enhanced implementation based on golang/protobuf.
The gogo # library is developed based on the official library and adds many functions, including:
-
Fast serialization and deserialization.
-
More standardized Go data structure.
-
goprotobuf compatible.
-
Some auxiliary methods can be selected to reduce the code input in use.
-
You can choose to generate test code and benchmark code.
-
Other serialization formats.
At present, many well-known projects are using the library, such as etcd, k8s, tidb, docker swarm kit, etc.
6. How to use gogo / protobuf
In https://github.com/gogo/protobuf Under the root directory, we can see that there are many folders, among which the files prefixed with "protocol gen" are the plug-ins for generating code, and other "proto", "protobuf", "gogoproto" are the library files.
The gogo # library currently has three ways to generate code
-
gofast: speed first, but this method does not support other gogoprotobuf extension options.
- Gogofast, gogofast, gogoslick: faster, but generates more code.
$ go get github.com/gogo/protobuf/proto $ go get github.com/gogo/protobuf/{binary} // protoc-gen-gogofast,protoc-gen-gogofaster ,protoc-gen-gogoslick $ go get github.com/gogo/protobuf/gogoproto $ protoc -I=. -I=$GOPATH/src -I=$GOPATH/src/github.com/gogo/protobuf/protobuf --{binary}_out=. myproto.proto // The {binary} here does not contain the prefix "protocol gen"
- gogofast is similar to gofast, but gogoprotobuf library will be introduced.
- Gogofast is similar to gogofast, but it will not produce XXX_ The pointer field of unrecognized class can reduce the garbage collection time.
- gogoslick is similar to gogofast, but it will add some additional strings, gostring and equal method.
-
Protocol Gen gogo: the fastest and the most customizable.
$ go get github.com/gogo/protobuf/proto $ go get github.com/gogo/protobuf/jsonpb $ go get github.com/gogo/protobuf/protoc-gen-gogo $ go get github.com/gogo/protobuf/gogoproto
-
Serialization can be highly customized through extension options.
gogo/protobuf provides many extension options for more control when generating code. Here is a comprehensive introduction to the extension options mentioned above: extensions. The extension options mainly include options for generating fast serialization and deserialization code, options for generating more standardized Golang data structure, options for goprotobuf compatibility, options for generating auxiliary methods, options for generating test code and benchmark, jsontag can also be added.
Some students have made some pressure tests on the serialization performance of the above multiple generation methods. Under general requirements, the performance gap is not very large, and the protocol Gen goFast method can basically meet most scenarios.
Finally, the generated go language code is very simple to use in the project. Generally, it only needs to use proto Marshal,proto. The unmarshal method is OK. Here is an example:
package main import ( "fmt" "log" zaproto "git.xxxxx.com/data/za-proto/proto" "github.com/gogo/protobuf/proto" ) func main() { req := &zaproto.SearchRequestParam{ QueryText: "xxxxxx", Limit: 10, Type: zaproto.SearchRequestParam_PC, } data, err := proto.Marshal(req) if err != nil { log.Fatal("Marshal err : err") } // send data fmt.Println(string(data)) var respData []byte var result = zaproto.SearchResultPage{} if err = proto.Unmarshal(respData, &result); err == nil { fmt.Println(result) } else { log.Fatal("Unmarshal err : err") } }
reference resources:
alecthomas/go_serialization_benchmarks: Benchmarks of Go serialization methods (github.com)
So you want to use GoGo Protobuf (jbrandhorst.com)
Schema evolution in Avro, Protocol Buffers and Thrift — Martin Kleppmann's blog
Language Guide | Protocol Buffers | Google Developers
Serialization and deserialization - meituan technical team (meituan.com)
Is Protobuf five times faster than JSON- InfoQ
Performance comparison of several Go serialization libraries | bird's nest (colobu.com)
Think about gRPC: why protobuf | hengyunduanling column (hengyunabc.github.io)
Performance comparison of several Go serialization libraries | bird's nest (colobu.com)
https://github.com/gogo/protobuf