Protocol Buffer Foundation: C++

Posted by g00bster on Sat, 05 Mar 2022 15:55:55 +0100

Original address: https://developers.google.com/protocol-buffers/docs/cpptutorial

This article briefly introduces how to use protocol buffer in C + + through an example. You can learn:

How to work in the The message format is defined in the proto file;
How to use the protocol buffer compiler;
How to use the C++ protocol buffer interface to read and write messages;

For more detailed usage, see Protocol Buffer Language Guide (proto2),Protocol Buffer Language Guide (proto3),C++ API Reference,C++ Generated Code Guide,Encoding Reference.

1. Why use Protocol Buffers?

Suppose there is an Address Book program that can read and write people's contact information, and each person is associated with the following data:

ID
full name
Email
Telephone number

How to serialize and retrieve such structured data? There are several ways:

Send / save the original memory data structure in binary form. This method is not robust with the development of the project, because the code receiving data must be compiled with complete memory layout, byte order, etc. In addition, if a field is added to the data structure, the released software cannot parse the new data structure, so the scalability is also greatly limited.
Encode data into a single string in a special way. For example, encode the four integers 12, 3, - 23 and 67 as "12:3: - 23:67". This method is simple and flexible, which is very suitable for simple data structure. The disadvantage is that the whole segment of data needs to be encoded and decoded at one time, rather than segment and time-sharing.
Serialize data into XML. The advantage is that it is human readable, has many language binding libraries, and can share data with other programs / projects. The disadvantage is that XML occupies a large space, resulting in performance loss of data encoding and decoding. In addition, navigating the XML DOM tree is much more complex than navigating the fields in the data structure.

Protocol Buffers provides flexible, efficient and automated solutions. When using, write it first proto file to describe the data structure. Then, the protocol buffer compiler will create a class that realizes the automatic encoding and decoding of protocol buffer data in an efficient binary format. The generated class provides getter and setter functions for fields, and handles the details of reading and writing protocol buffer as a unit. Importantly, after extending the data structure, the old program can still parse the original fields.

2. Sample code

In the examples directory in the source package.

Download address: https://developers.google.com/protocol-buffers/docs/downloads

3. Define Protocol message format

For example, there is such a data structure:

ID
full name
Email
Telephone number

Need to be in Add a message to the data structure in the proto file, and then specify the name and type for each field in the message. For example:

syntax = "proto2";

package tutorial;

message Person {
  optional string name = 1;
  optional int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    optional string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phones = 4;
}

message AddressBook {
  repeated Person people = 1;
}

As you can see, the syntax is similar to C + + or Java. Let's browse each part of the file and see what we've done.

3.1 package statement

package tutorial;

The. proto file starts with the package declaration and is similar to the namespace to prevent naming conflicts.

3.2 message definition

message Person {
  optional string name = 1;
  optional int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    optional string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phones = 4;
}

message is a set of fields. Many standard simple data types can be used as field types, such as bool, int32, float, double and string. Messages can be nested, such as defining PhoneNumber in Person.

"= 1" and "= 2" are the unique marks of this field in binary encoding. Among them, 1 ~ 15 takes up less space, so it is generally used for commonly used fields or repeated fields.

Each field must use a modifier:

optional: this field can be assigned or not assigned. When the value is not assigned, the compiler automatically sets a default value for this field: the number type is 0 by default, the string is empty by default, and the Boolean value is false by default.
Repeated: this field can be repeated n times (including 0). It can be regarded as a dynamic array.
required: this field must be assigned a value, otherwise it will be regarded as "uninitialized".

Note: required must be used with caution! If you change this field to optional at some point, there may be a problem - the old program may inadvertently reject or discard these messages. Within Google, the required field is not popular. Most messages defined in proto2 syntax only use optional or repeated. proto3 syntax cancels the support for required.

About For a complete guide to the proto file, see Protocol Buffer Language Guide . Don't try to find tools like class inheritance - protocol buffer won't do that.

4. Compilation proto file

Next, you need to use the protoc ol tool to compile Classes required for proto file generation:

If the compiler is not installed, you need to go to download the package Download and read README;
Run protoc ol, specify the source directory (the default current directory), the generated directory (usually the same as $SRC_DIR), and proto file path:

protoc -I=$SRC_DIR --cpp_out=$DST_DIR $SRC_DIR/addressbook.proto

After compilation, two files are generated:

addressbook.pb.h: Header file
addressbook.pb.cc: implementation file

5. Protocol Buffer interface

Generated addressbook pb. H has addressbook For each message class defined in proto, different types of fields have different functions:

  // optional string name = 1;
  inline bool has_name() const;
  inline void clear_name();
  inline const ::std::string& name() const;
  inline void set_name(const ::std::string& value);
  inline void set_name(const char* value);
  inline ::std::string* mutable_name();

  // optional int32 id = 2;
  inline bool has_id() const;
  inline void clear_id();
  inline int32_t id() const;
  inline void set_id(int32_t value);

  // optional string email = 3;
  inline bool has_email() const;
  inline void clear_email();
  inline const ::std::string& email() const;
  inline void set_email(const ::std::string& value);
  inline void set_email(const char* value);
  inline ::std::string* mutable_email();

  // repeated PhoneNumber phones = 4;
  inline int phones_size() const;
  inline void clear_phones();
  inline const ::google::protobuf::RepeatedPtrField< ::tutorial::Person_PhoneNumber >& phones() const;
  inline ::google::protobuf::RepeatedPtrField< ::tutorial::Person_PhoneNumber >* mutable_phones();
  inline const ::tutorial::Person_PhoneNumber& phones(int index) const;
  inline ::tutorial::Person_PhoneNumber* mutable_phones(int index);
  inline ::tutorial::Person_PhoneNumber* add_phones();

For required and optional fields
- The function name to get data has the same name as the lowercase field. For example, the function to get the name field is name(). In addition, string type fields have additional mutable s_ Function to obtain the pointer to the string. Even if the field is not set, calling this function will automatically initialize the non empty string;
- Use set_ Function setting value;
- Using has_ Function to determine whether a value is set;
- Using clear_ The function restores the field to the empty state;
For the repeated field
- Use the index to obtain the array value;
- Update the array value with the index;
- Use_ The size function determines the size of the array;
- Using clear_ The function restores the field to the empty state;
- Use add_ Function to add a new data to the array;

For more information about the functions generated by different fields, see C++ generated code reference.

5.1 enumeration and nested classes

message Person {
  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }
}

The generated PhoneType enumeration is called Person::PhoneType, and its values are:

Person::MOBILE
Person::HOME
Person::WORK

message Person {
  message PhoneNumber {
    optional string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }
}

The generated nested class is Person::PhoneNumber. But it's actually Person_ Phonenumber, a typedef defined inside Person, allows you to treat it as a nested class. The difference is that in the case of "forward declaration of classes in another file", you cannot forward declare nested types in C + +, but you can forward declare Person_PhoneNumber.

5.2 standard Message method

Each Message class also contains many other methods, including:

bool IsInitialized() const: check that all required fields are set.
string DebugString() const: returns the debugging information of Message.
Void copyfrom (const person & from): copy from to overwrite the current Message.
void Clear(): clear all elements back to the empty state.

For more details, see complete API documentation for Message.

5.3 serialization and parsing

Finally, each class has methods to read and write messages:

bool SerializeToString(string* output) const: serialize messages into strings. Note that bytes are binary, not text; This is just a convenient container for strings.
Bool parsefromstring (const string & data): parses the message from the given string.
Bool serializetostream (ostream * output) const: write the Message to the given C++ ostream.
Bool parse from istream (istream * input): parse the Message from C++ istream.

For more details, see complete API documentation for Message.

Protocol Buffers and object-oriented design
The class generated by Protocol Buffer can basically be regarded as the structure in C. If you want to extend the behavior of the class, it is recommended to encapsulate the Protocol Buffer class with the class in the program. If you can't control it For the design of proto files, for example, if you are reusing files from another project, encapsulating Protocol Buffer is also a good idea. In this case, you can use wrapper classes to make interfaces more suitable for the unique environment of the program: hiding some data and methods, etc. Not recommended: inherit generated classes. This will destroy the internal mechanism, and this way is not a good object-oriented practice.

6. Write Message

To write data to a Message, you need to create and populate instances of the class and then write them to the output stream.

Look back again Data structure defined in proto file:

syntax = "proto2";

package tutorial;

message Person {
  optional string name = 1;
  optional int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    optional string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phones = 4;
}

message AddressBook {
  repeated Person people = 1;
}

The following part is the code for writing data to Message. The first is the header file:

#include <iostream>
#include <fstream>
#include <string>
#include "addressbook.pb.h"
using namespace std;

Then, the PromptForAddress function writes the data into person according to the input:

void PromptForAddress(tutorial::Person* person)
{
  cout << "Please enter ID: ";
  int id;
  cin >> id;
  person->set_id(id);
  cin.ignore(256, '\n');

  cout << "Please enter name: ";
  getline(cin, *person->mutable_name());

  cout << "Please enter email: ";
  string email;
  getline(cin, email);
  if (!email.empty()) {
    person->set_email(email);
  }

  while (true) {
    cout << "Please enter PhoneNumber(Enter blank to complete):";
    string number;
    getline(cin, number);
    if (number.empty()) {
      break;
    }

    tutorial::Person::PhoneNumber* phone_number = person->add_phones();
    phone_number->set_number(number);

    cout << "input mobile, home, work Determine the type of the phone number:";
    string type;
    getline(cin, type);
    if (type == "mobile") {
      phone_number->set_type(tutorial::Person::MOBILE);
    } else if (type == "home") {
      phone_number->set_type(tutorial::Person::HOME);
    } else if (type == "work") {
      phone_number->set_type(tutorial::Person::WORK);
    } else {
      cout << "Unknown type, use default home type." << endl;
    }
  }
}

Finally, the main() function:

int main(int argc, char* argv[])
{
  // Verify that the lib version of our link is compatible with the previously compiled header file
  GOOGLE_PROTOBUF_VERIFY_VERSION;

  //filename is required as the startup parameter
  if (argc != 2) {
    cerr << "Usage:  " << argv[0] << " ADDRESS_BOOK_FILE" << endl;
    return -1;
  }

  tutorial::AddressBook address_book;

  // read file
  fstream input(argv[1], ios::in | ios::binary);
  if (!input) {
    cout << argv[1] << ": The file cannot be found. Create a new file." << endl;
  } else if (!address_book.ParseFromIstream(&input)) {
    cerr << "Parsing failed!" << endl;
    return -1;
  }

  // Add Person
  PromptForAddress(address_book.add_people());

  // write file
  fstream output(argv[1], ios::out | ios::trunc | ios::binary);
  if (!address_book.SerializeToOstream(&output)) {
    cerr << "Failed to write file!" << endl;
    return -1;
  }

  // Optional: delete the global object allocated by libprotobuf
  google::protobuf::ShutdownProtobufLibrary();

  return 0;
}

Some notes:

Pay attention to GOOGLE_PROTOBUF_VERIFY_VERSION macro. Although this macro is not required, it is a good practice to execute it before use. It verifies that you have not accidentally linked to a library version that is incompatible with the compiled header version, and if it does not match, the program aborts. And everyone pb. The CC file automatically calls the macro.
Note the call to shutdownprotobufferlibrary() at the end of the program. This function will delete all global objects allocated by the Protocol Buffer library. This is not necessary for most programs, because the process will exit anyway, and the operating system will be responsible for reclaiming all its memory. However, you must use this function if you use the memory leak checker or if you need to load / unload libraries multiple times in your program.

7. Read Message

The following example reads a file and prints information.

The first is the header file:

#include <iostream>
#include <fstream>
#include <string>
#include "addressbook.pb.h"
using namespace std;

Then, the ListPeople() function lists the address_ Each Person information in the book:

void ListPeople(const tutorial::AddressBook& address_book)
{
  for (int i = 0; i < address_book.people_size(); i++) {
  
    const tutorial::Person& person = address_book.people(i);
    cout << "ID: " << person.id() << endl;
    cout << "Name: " << person.name() << endl;
    if (person.has_email()) {
      cout << "E-mail: " << person.email() << endl;
    }

    for (int j = 0; j < person.phones_size(); j++) {
    
      const tutorial::Person::PhoneNumber& phone_number = person.phones(j);
      switch (phone_number.type()) {
        case tutorial::Person::MOBILE: cout << "Mobile phone: "; break;
        case tutorial::Person::HOME:   cout << "Home phone: "; break;
        case tutorial::Person::WORK:   cout << "Work phone: "; break;
      }
      cout << phone_number.number() << endl;
    }
  }
}

Finally, the main() main function:

int main(int argc, char* argv[])
{
  GOOGLE_PROTOBUF_VERIFY_VERSION;

  if (argc != 2) {
    cerr << "Usage:  " << argv[0] << " ADDRESS_BOOK_FILE" << endl;
    return -1;
  }

  tutorial::AddressBook address_book;
  
  fstream input(argv[1], ios::in | ios::binary);
  if (!address_book.ParseFromIstream(&input)) {
    cerr << "Failed to parse address book." << endl;
    return -1;
  }

  ListPeople(address_book);

  google::protobuf::ShutdownProtobufLibrary();

  return 0;
}

8. Extend Protocol Buffer

Expand the old The definitions in the proto file are backward compatible and need to follow some rules:

It is forbidden to modify the tag number of the old field;
It is forbidden to add or delete the required field;
optional and repeated fields can be deleted;
optional and repeated fields can be added, but the new tag number must be used. You cannot use the label number of a deleted field.

There are some exceptions to the above rules, but they are rarely used.

Following these rules, the old code will be able to read the new Message and ignore the new fields. For old code, the deleted optional field will only have its default value, and the deleted repeated field will be empty. The new code can also read the old Message.

Remember that the new optional field will not appear in the old Message, so you need to call has_ Function to check, or in Use [default = value] in the proto file to provide a reasonable default value. If no default value is specified for the optional element, the type specific default value is used: the string default value is an empty string; Boolean value: the default value is false; The default value of numeric type is 0.

Also note that if you add a repeated field, your new code will not be able to determine whether it is left blank (through the new code) or not set at all (through the old code), because it has no has_ Sign.

9. Optimization

The C++ Protocol Buffers library is highly optimized. However, there are some tips to further improve performance:

Reuse Message objects whenever possible. Clearing Meessage preserves the allocated memory for reuse. Therefore, if you want to process the same type of Message continuously, it is best to reuse the same Message object every time to reduce the load on the memory allocator. However, over time, objects may become bloated, especially if your Message "shape" is different, or if you occasionally build a Message that is much larger than usual. You should monitor the size of Message objects by calling the SpaceUsed method and delete them when they become too large.
Use Google's tcmalloc instead. Because your system's memory allocator may not be optimized for allocating a large number of small objects from multiple threads.

10. Advanced usage

The purpose of Protocol Buffer goes beyond general accessors and serialization. We must explore C++ API reference , see what else you can do with them.

A key feature of the Protocol Message class is reflection. You can traverse the fields of a Message and manipulate their values without writing code for any particular Message type.

A very useful way to use reflection is to convert protocol messages to and from other encodings, such as XML or JSON.
A more advanced use of reflection may be to find the difference between two messages of the same type, or to develop a "regular expression for protocol messages", in which you can write an expression that matches the content of some messages. If you use your imagination, you can apply Protocol Buffers to a wider range of problems than you originally expected!

Reflection by Message::Reflection interface Provide.

Topics: C++ Back-end

Programmer Think