Nebula Graph source code interpretation series | Communication Secrets of the client -- fbthrift

Posted by eskimo42 on Wed, 23 Feb 2022 05:41:43 +0100

summary

Nebula Clients provides users with API s of various programming languages for interacting with Nebula Graph, and repackages the data structure returned by the server for the convenience of users.

At present, the languages supported by Nebula Clients include C + +, Java, Python, Golang and Rust.

Communication framework

Nebula Clients uses fbthrift https://github.com/facebook/fbthrift As the RPC communication framework between server and client, it realizes cross language interaction.

fbthrift provides three functions:

  1. Generated code: fbthrift can sequence different languages into data structures
  2. Serialization: serializes the generated data structure
  3. Communication and interaction: transfer messages between clients and servers, and call corresponding server functions when receiving requests from clients in different languages

example

Here, take Golang client as an example to show the application of fbthrift in Nebula Graph.

  1. Definition of Vertex structure on the server: struct Vertex{ Value vid; std::vector<Tag> tags; Vertex() = default; };
  2. First, in Src / interface / common Some data structures are defined in thrift:
struct Tag {
		1: binary name,
		// List of <prop_name, prop_value>
		2: map<binary, Value> (cpp.template = "std::unordered_map") props,
} (cpp.type = "nebula::Tag")

struct Vertex {
		1: Value     vid,
		2: list<Tag> tags,
} (cpp.type = "nebula::Vertex")

Here we define a Vertex structure, in which (cpp.type = "nebula::Vertex") indicates that this structure corresponds to the nebula::Vertex of the server.

  1. fbthrift will automatically generate the data structure of Golang for us:
// Attributes:
//  - Vid
//  - Tags
type Vertex struct {
	Vid *Value `thrift:"vid,1" db:"vid" json:"vid"`
	Tags []*Tag `thrift:"tags,2" db:"tags" json:"tags"`
}

func NewVertex() *Vertex {
	return &Vertex{}
}

...

func (p *Vertex) Read(iprot thrift.Protocol) error { // Deserialization
	...
}

func (p *Vertex) Write(oprot thrift.Protocol) error { // serialize
	...
}
  1. In the statement MATCH (v:Person) WHERE id(v) == "ABC" RETURN v ": the client requests a vertex from the server. After the server finds the vertex, it will serialize it and send it to the client through the transport of RPC communication framework. When the client receives this data, it will deserialize it, Generate the data structure (type Vertex struct) defined in the corresponding client.

Client module

In this chapter, we will take Nebula go as an example to introduce each module of the client and its main interfaces.

  1. The configuration class Configs provides global configuration options.
type PoolConfig struct {
	// Set the timeout. 0 means no timeout. The unit is ms. The default is 0
	TimeOut time.Duration
	// The maximum idle time of each connection. When the connection is not used beyond this time, it will be disconnected and deleted. 0 means permanent idle, and the connection will not be closed. The default is 0
	IdleTime time.Duration
	// max_connection_pool_size: sets the maximum number of connections in the connection pool. The default is 10
	MaxConnPoolSize int
	// Minimum number of idle connections, 0 by default
	MinConnPoolSize int
}//Manage Session specific information
type Session struct {
	// Used for identity verification or message retry when executing commands
	sessionID  int64
	// Currently held connections
	connection *connection
	// Connection pool currently in use
	connPool   *ConnectionPool
	// Log tool
	log        Logger
	// Used to save the time zone used by the current Session
	timezoneInfo
}	// Execute nGQL, and the returned data type is ResultSet. This interface is non thread safe.
	func (session *Session) Execute(stmt string) (*ResultSet, error) {...}
	// Get the connection from the connection pool again for the current Session
	func (session *Session) reConnect() error {...}
	// Do signout, release the Session ID, and return the connection to the pool
	func (session *Session) Release() {// Create a new connection pool and complete the initialization with the entered service address
func NewConnectionPool(addresses []HostAddress, conf PoolConfig, log Logger) (*ConnectionPool, error) {...}
// Verify and get the Session instance
func (pool *ConnectionPool) GetSession(username, password string) (*Session, error) {...}
  1. Client Session, which provides an interface directly called by the user.
  2. Interface definitions include the following
  3. The connection pool manages all connections. The main interfaces are as follows
  4. Connect the Connection, encapsulate the network of thrift, and provide the following interfaces / / to establish a Connection with the specified ip and port
func (cn *connection) open(hostAddress HostAddress, timeout time.Duration) error {...}
// Verify user name and password
func (cn *connection) authenticate(username, password string) (*graph.AuthResponse, error) {
// Execute query
func (cn *connection) execute(sessionID int64, stmt string) (*graph.ExecutionResponse, error) {...}
// Determine whether the connection is available by sending "YIELD 1" with SessionId 0
func (cn *connection) ping() bool {...}
// Release sessionId to graphd
func (cn *connection) signOut(sessionID int64) error {...}
// Disconnect
func (cn *connection) close() {...}
  1. Load balancing, which uses this module in the connection pool
    • Policy: polling policy

Module interaction analysis

  1. Connection pool
// Initialize connection pool
pool, err := nebula.NewConnectionPool(hostList, testPoolConfig, log)
if err != nil {
	log.Fatal(fmt.Sprintf("Fail to initialize the connection pool, host: %s, port: %d, %s", address, port, err.Error()))
}
// Close all connections in the pool when program exits
defer pool.Close()

// Create session
session, err := pool.GetSession(username, password)
if err != nil {
	log.Fatal(fmt.Sprintf("Fail to create a new session from connection pool, username: %s, password: %s, %s",
		username, password, err.Error()))
}
// Release session and return connection back to connection pool when program exits
defer session.Release()

// Excute a query
resultSet, err := session.Execute(query)
if err != nil {
	fmt.Print(err.Error())
}
  • initialization:
    • When using, the user needs to create and initialize a Connection pool first. During initialization, the Connection pool will establish a Connection connection to the address of the Nebula service specified by the user. If multiple Graph services are deployed in the cluster deployment mode, the Connection pool will use the rotation strategy to balance the load and establish almost the same amount of connections for each address.
  • Manage Connections:
    • Two queues are maintained in the connection pool, idle connection queue idleConnectionQueue and active connection queue idleConnectionQueue. The connection pool will regularly detect expired idle connections and close them. When adding or deleting elements, these two queues will ensure the correctness of multi-threaded execution through read-write locks.
    • When a Session requests a connection from the connection pool, it will check whether there are available connections in the idle connection queue. If so, it will be directly returned to the Session for users to use; If there are no available connections and the current total number of connections does not exceed the maximum number of connections specified in the configuration, create a new connection to the Session; If the maximum number of connections has been reached, an error is returned.
  • Generally, the connection pool needs to be closed only when the program exits. All connections in the pool will be disconnected when it is closed.
  1. Client session
    • The client Session is generated through the connection pool. The user needs to provide the user password for verification. After the verification is successful, the user will obtain a Session instance and communicate with the server through the connection in the Session. The most commonly used interface is execute(). If an error occurs during execution, the client will check the type of error. If it is due to the network, it will automatically reconnect and try to execute the statement again.
    • It should be noted that a Session cannot be used by multiple threads at the same time. The correct way is to apply for multiple sessions with multiple threads, and each thread uses one Session.
    • When a Session is released, the connections it holds will be put back into the free connection queue of the connection pool so that they can be reused by other sessions later.
  2. connect
    • Each connection instance is equivalent and can be held by any Session. The purpose of this design is that these connections can be reused by different sessions to reduce the overhead of repeatedly switching Transport.
    • The connection will send the client's request to the server and return the result to the Session.
  3. User usage example

Return data structure

The client encapsulates the query results returned by some complex servers and adds interfaces for users to use.

Basic types of query results

Encapsulated type

| Null |

| Bool |

| Int64 |

| Double |

| String |

| Time | TimeWrapper |

| Date |

| DateTime | DateTimeWrapper |

| List |

| Set |

| Map |

| Vertex | Node |

| Edge | Relationship |

| Path | PathWrraper |

| DateSet | ResultSet |

|- | record (used for line operation of ResultSet)|

For nebula::Value, it will be wrapped as valuewrapper on the client side and converted to other structures through the interface. (i.g. node = ValueWrapper.asNode())

Analysis of data structure

For the statement match P = (V: player {Name: "Tim Duncan"}) - [] - > (V2) return P, the return result is:

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| p                                                                                                                                                                                                                         |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| <("Tim Duncan" :bachelor{name: "Tim Duncan", speciality: "psychology"} :player{age: 42, name: "Tim Duncan"})<-[:teammate@0 {end_year: 2016, start_year: 2002}]-("Manu Ginobili" :player{age: 41, name: "Manu Ginobili"})> |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Got 1 rows (time spent 11550/12009 us)

We can see that the returned result contains a line, and the type is a path At this time, if you need to obtain the attribute of the path end point (v2), you can do the following:

// Excute a query
resultSet, _ := session.Execute("MATCH p= (v:player{name:"\"Tim Duncan"\"})-[]->(v2) RETURN p")

// Get the first row of the result. The index of the first row is 0
record, err := resultSet.GetRowValuesByIndex(0)
if err != nil {
	t.Fatalf(err.Error())
}

// Take the value of the cell in the first column from the first row
// At this time, the type of valInCol0 is ValueWrapper 
valInCol0, err := record.GetValueByIndex(0)

// Convert ValueWrapper to PathWrapper object
pathWrap, err = valInCol0.AsPath()

// Get the end point directly through the GetEndNode() interface of PathWrapper
node, err = pathWrap.GetEndNode()

// Get all properties through node's Properties()
// The type of props is map[string]*ValueWrapper
props, err = node.Properties()

Client address

GitHub address of each language client: