This is how goim achieves high concurrency

Posted by phpparty on Sun, 20 Feb 2022 09:51:43 +0100

This chapter will explain the implementation principle of goim high concurrency from two aspects: Architecture and program design.


Firstly, in terms of architecture, goim is divided into three layers: comet, logic and job.

Comet belongs to the access layer and is very easy to expand. Multiple comet nodes can be directly opened. Front end access can be forwarded using LVS or DNS.

Logic belongs to the stateless logic layer. Nodes can be added at will. nginx upstream is used to expand the http interface. The internal rpc part can be forwarded by LVS four layers.

job is used to decouple comet and logic.

The system uses kafka as the message queue. You can expand the queue by using multiple broker s or partition s. redis is used to maintain metadata and node heartbeat information.


Program design

Secondly, in terms of program design, one is to split the granularity of locks as much as possible to reduce resource race. Second, in terms of memory management, by applying for a large memory, then disassembling it into the required data type and managing it by yourself, we can reduce the loss of performance caused by frequent application and memory destruction operations. Third, make full use of goroutine and channel to achieve high concurrency. Fourth, reasonably apply buffer to provide read-write performance.

Granularity of split lock

For example, comet module splits TCP links through buckets. Each TCP link is divided into different} buckets for management according to certain rules, rather than concentrated in a single large and full bucket. In this way, the granularity of lock is smaller, the probability of resource race is lower, and the performance can be better improved. There is no need to lock the time spent.

//Initialize the Server and generate multiple bucket s
func NewServer(c *conf.Config) *Server {
	s.buckets = make([]*Bucket, c.Bucket.Size)
	s.bucketIdx = uint32(c.Bucket.Size)
	for i := 0; i < c.Bucket.Size; i++ {//Generate multiple bucket s
		s.buckets[i] = NewBucket(c.Bucket)
//Obtain buckets according to subKey, and allocate different TCP to different buckets for management. 
func (s *Server) Bucket(subKey string) *Bucket {
	idx := cityhash.CityHash32([]byte(subKey), uint32(len(subKey))) % s.bucketIdx
	if conf.Conf.Debug {
		log.Infof("%s hit channel bucket index: %d use cityhash", subKey, idx)
	return s.buckets[idx]
 Broadcast messages pass through circular buckets. Each bucket has its own lock. By splitting the granularity of the lock, the race state of the lock is reduced, which is very high performance.
func (s *server) Broadcast(ctx context.Context, req *pb.BroadcastReq) (*pb.BroadcastReply, error) {
	go func() {
		for _, bucket := range s.srv.Buckets() {
			bucket.Broadcast(req.GetProto(), req.ProtoOp)
			if req.Speed > 0 {
				t := bucket.ChannelCount() / int(req.Speed)
				time.Sleep(time.Duration(t) * time.Second)


memory management

In the comet module, in the round(internal/comet/round.go), sufficient read and write buffers and timers will be applied at one time for maintenance through an idle linked list. When needed, each TCP link is obtained from these free linked lists and put back after use. For TCP read goroutine, each TCP has a proto buffer (ring), which is realized through a ring array.

//According to the configuration, NewRound applies for various data types in advance for use.
func NewServer(c *conf.Config) *Server {
	s := &Server{
		c:         c,
		round:     NewRound(c),
		rpcClient: newLogicClient(c.RPCClient),
//Each tcp connection obtains a Timer, Reader and Writer from round
func serveTCP(s *Server, conn *net.TCPConn, r int) {
	var (
		tr = s.round.Timer(r)
		rp = s.round.Reader(r)
		wp = s.round.Writer(r)
	s.ServeTCP(conn, rp, wp, tr)

//Each tcp connection generates a proto type ring array through ring(internal/comet/ring.go) for reading data.
func (s *Server) ServeTCP(conn *net.TCPConn, rp, wp *bytes.Pool, tr *xtime.Timer) {
var (	
		ch      = NewChannel(s.c.Protocol.CliProto, s.c.Protocol.SvrProto) //Ring array
//In the process of reading data, first obtain a proto from the ring array, and then write the data to the proto
if p, err = ch.CliProto.Set(); err != nil {
if err = p.ReadTCP(rr); err != nil {

type Round struct {
	readers []bytes.Pool
	writers []bytes.Pool
	timers  []time.Timer
	options RoundOptions

func NewRound(c *conf.Config) (r *Round) {
	// reader
	r.readers = make([]bytes.Pool, r.options.Reader) //Generate N cache pools
	for i = 0; i < r.options.Reader; i++ { 
r.readers[i].Init(r.options.ReadBuf, r.options.ReadBufSize)
	// writer
	r.writers = make([]bytes.Pool, r.options.Writer)
	for i = 0; i < r.options.Writer; i++ {
		r.writers[i].Init(r.options.WriteBuf, r.options.WriteBufSize)
	// timer
	r.timers = make([]time.Timer, r.options.Timer)
	for i = 0; i < r.options.Timer; i++ {

goroutine and channel achieve high concurrency

For example, comet, for pushing room messages, each bucket divides the push channels into 32, and each channel is 1024 long. Each channel is consumed by a goroutine. When pushing room # messages, push them to these 32 channels in turn. In this way, the concurrency inside the bucket will be improved, so that one channel will not be blocked, resulting in all waiting.  

//Each bucket generates routinemount channels, and each Channel is processed by a roomproc.
func NewBucket(c *conf.Bucket) (b *Bucket) {
	b.routines = make([]chan *pb.BroadcastRoomReq, c.RoutineAmount)
	for i := uint64(0); i < c.RoutineAmount; i++ {
		c := make(chan *pb.BroadcastRoomReq, c.RoutineSize)
		b.routines[i] = c
		go b.roomproc(c)
func (b *Bucket) roomproc(c chan *pb.BroadcastRoomReq) {
	for {
		arg := <-c
		if room := b.Room(arg.RoomID); room != nil {

//Send polling messages to routes.
func (b *Bucket) BroadcastRoom(arg *pb.BroadcastRoomReq) {
	num := atomic.AddUint64(&b.routinesNum, 1) % b.c.RoutineAmount
	b.routines[num] <- arg

At the same time, goroutine and channel are also fully used in Job , in which each comet distinguishes different message push channels.

1.pushChan: channels that push single chat messages are divided into n groups. Among the N groups that push messages in turn, each group has its own goroutine to improve concurrency

2.roomChan: the channel that pushes group chat messages is divided into n groups. Among the N groups that push messages in turn, each group has its own goroutine to improve concurrency

3.broadcastChan: broadcast message

4. Open N# goroutines, and each goroutine receives single chat, group chat and broadcast messages.

Reasonable use of buffer to provide read-write performance

When a job pushes a room message, it is not pushed to comet when it receives the message. It is to realize batch push through certain strategies to improve the reading and writing performance.  

For a room message, a goroutine will be started to process it, and the write buffer mechanism will be started to send it by batch (number of messages). After receiving the message, it is not sent immediately, but buffered. Wait for a period of time to see if there are any messages. The conditions for pushing are: first, the maximum number of batches has been reached, and second, timeout. If there is no message for a long time, the room will be destroyed.

For details, refer to internal / job / room Go pushproc implementation.


After reading the goim source code, I think how to design a highly concurrent service is mainly reflected in several aspects. First, split functions and expand and shrink the capacity of each module. Split granularity to reduce race state and performance loss. The second is to use memory in a more ingenious way to reduce the performance loss of frequent application and destruction of memory. Third, make full use of language features to achieve high concurrency.