Implementation of knowledge consolidation source code 6: c language splicing string and cutting string (strsep) code

Posted by knashash on Tue, 15 Feb 2022 10:49:19 +0100

Network data is transmitted in the form of stream (we must have a specific format when constructing the data to be sent by the client / server).

The integrity and reliable reception of a packet (reliable transmission of tcp and the problem of half packet sticking of data) are not concerned here. You can refer to the above.

The purpose here is:

===Backup c language character processing related schemes

===Back up an interface (related interfaces of strstr, strcasestr, strsep, strtok, strdup) that is implemented for string cutting during business implementation

There are several details to pay attention to:

===1: scheme for splicing strings of different types of data (3.1)

===< 2: char * * is constructed inside the function by passing parameters (3.2.4.2)

===3: several schemes for parsing strings, especially the implementation of split strings (strsep implements demo) (3.2.4)

1: Background description

In the process of network transmission, for example, when the tcp client can communicate with the server, the data actually sent by the send function is expressed in character array + length, that is, in the form of stream.

For one transmission and complete reception of one packet, it depends on the reliable streaming transmission at the bottom of tcp, which requires semi packet and sticky packet processing. Please refer to the above.

However, for the structure of a package, we actually need to design according to our business scenarios, construct and analyze according to a specific structure.

Here we focus on the construction and parsing scheme of sending one package at a time, as well as the code implementation for backup

2: Logical description

2.1: general description

In network transmission, the processing of character stream is very important, which is based on our specific specifications

===In addition to paying attention to the complete reception of a packet (sticky packet and half packet problem), we also need to pay attention to the data itself

===1: similar to tcp/ip protocol stack, a specific protocol stack and a specific byte represent a specific meaning for parsing (fixed byte size represents meaning)

===2: we can customize the specifications, such as using specific characters / strings for recognition and dividing the meaning represented by different characters (such as msg_type|other_type|msg_len|msg_data)

Here, we will sort out and test the code for the data construction and analysis of the two schemes described above...

2.2: data construction (the purpose is to construct the char * to be sent in a specific format and the acquisition length)

Generally, the data processing of the sender and receiver is negotiated. It is constructed according to a specific format and will be parsed according to a specific format.

The essence of data construction: in fact, it is to insert the negotiated format data into the char pointer according to the specific format and obtain the length of the actually sent data.

===1: similar to the tcp/ip protocol header, it can be sent by forcibly turning the structure pointer and obtaining the actual length

===2: construct char * pointer to be sent and obtain length scheme: use memcpy

===The length of the fun c tion library to be sent: STR * and STR 3

===4: construct char * pointer to be sent and obtain length scheme: use sprintf, sprintf_s (think it's best)

2.3: data analysis (the purpose is to analyze the received char * according to the negotiated format)

In the form of stream, the data to be received (actually char * pointer and data length) is parsed according to the negotiation.

The essence of parsing: according to specific bytes, or the whole stream is processed to obtain special location data in turn

===1: pointer forced rotation, similar to tcp/ip protocol stack processing, can directly convert the received message into structure pointer for operation

===2: first take a specific length, and then take the actual data according to this length. (send length + data first (you can define the parsing format by yourself))

===3: analyze the received stream and analyze it (for example, divide it according to "|")

One of the purposes of this article is to back up a practical code divided according to "|"

3: Code sorting

3.1: demo of data structure to be sent

3.1.1: convert structure pointer to char * to implement / parse logic

Construct the structure and calculate the actual length. When send ing, it is directly converted to char * parameters for transmission...

Then use gcc to compile and test on linux environment, and the code execution is ok

//1: Similar to the tcp/ip protocol stack, in fact, it performs type forced conversion data analysis according to a specific structure
void parse_struct_format_data()
{
printf("parse_struct_format_data test: \n");
	//Assuming that a flexible array is defined, the process of constructing data and parsing data is as follows
	struct my_data_t
	{
		int msg_type;
		int data_len;
		char data[0];
	};
	//Suppose the data to be sent is client send data example \n. Type is 1
	const char* data = "client send data example. \n";
	//The final data constructed by the sender is as follows
	struct my_data_t *send_data = NULL;
	send_data = (struct my_data_t*)malloc(sizeof(struct my_data_t)+strlen(data)+1);
	if(send_data == NULL)
	{
		return ;
	}
	memset(send_data, '\0', sizeof(struct my_data_t)+strlen(data)+1);//It's just one place reserved
	send_data->msg_type = 1;
	send_data->data_len = strlen(data);
	memcpy(send_data->data, data, strlen(data));
	//Actually, send_data is a network data format packet finally constructed by us, which can be sent
	//When sending with send, only the parameter char * and the length of data to be sent are transmitted
	char *send_para_data = (char *)send_data; //Structure type forced conversion tcp recognizes the stream data in this structure
	int send_para_data_len = sizeof(struct my_data_t)+strlen(data);
	//If the client uses send_para_data and length send_para_data_len to send and verify...
	//The stream we receive, received by recv, is a character stream, and the content is actually send_para_data length is send_para_data_len 
	//Here we need to receive a complete packet (for the problem of half packet sticking, please refer to the above)
	//Just reverse analysis
	struct my_data_t *recv_data = (struct my_data_t*)send_para_data;
	//The actual data can be parsed and printed according to the length according to the logic of flexible array
	printf("\trecv_data type is [%d] \n",recv_data->msg_type);
	printf("\trecv_data len is [%d] \n",recv_data->data_len);
	//This is just a test. Note that if there are special characters in this data, such as \ 0, it cannot be printed like this. It should be printed according to the length in hexadecimal
	printf("\trecv_data len is [%s] \n", recv_data->data);
	printf("recv_data ASSII is [");
	for(int i=0; i<recv_data->data_len; i++)
	{
		printf("%02x ", recv_data->data[i]);
	}
	printf("]\n");
	if(send_data!= NULL)
	{
		free(send_data);
		send_data = NULL;
	}
}

3.1.2: use memcpy to realize string splicing and construct char*

The purpose is to obtain the final char * pointer position and the actual sent data length.

To is used here because of laziness_ String converts int to string for testing, so use c++11 when compiling. You can modify it with itoa

//1: Using memcpy to realize string splicing test of string1 and string2
int use_memcpy_splic_string()
{
printf("use_memcpy_splic_string test:\n");
	const char* str1 = "splicing test of ";
	const char* str2 = "string";
	const char* and1 = " and ";
	int one = 1;
	int two = 2;
	//To in c + + is used here_ String function c language can use itoa. Here, we mainly understand string splicing. sprintf is commonly used for string conversion and splicing of int type
	int size = strlen(str1) + strlen(str2) * 2 + (strlen(to_string(one).c_str())) + strlen(and1) + (strlen(to_string(two).c_str())) + 1;
	printf("\tget the len is : %d %lu\n", size, strlen("splicing test of string1 and string2"));

	int pos = 0;
	char* result = (char*)malloc(size);
	if (result == NULL)
	{
		return -1;
	}
	memset(result, '\0', size);
	memcpy(result, str1, strlen(str1));
	pos += strlen(str1);
	memcpy(result + pos, str2, strlen(str2));
	pos += strlen(str2);
	memcpy(result + pos, to_string(one).c_str(), strlen(to_string(one).c_str()));
	pos += strlen(to_string(one).c_str());
	memcpy(result + pos, and1, strlen(and1));
	pos += strlen(and1);
	memcpy(result + pos, str2, strlen(str2));
	pos += strlen(str2);
	memcpy(result + pos, to_string(two).c_str(), strlen(to_string(two).c_str()));
	pos += strlen(to_string(two).c_str());
	printf("\tthe result is [%lu][%s]  \n", strlen(result), result);
	printf("\tpos is [%d] \n", pos);
	if (result != NULL)
	{
		free(result);
		result = NULL;
	}
	return 0;
}

3.1.3: use c library function (strcpy, strcat)

Lazy use to_string(), c++11 for compiling and testing

//2: splicing test of string1 and string2 using c library function
//When writing code, you should pay attention to the length of the target string 
int use_clibrary_strcat_splic_string()
{
printf("use_clibrary_strcat test: \n");
	const char* str1 = "splicing test of ";
	const char* str2 = "string";
	const char* and1 = " and ";
	int one = 1;
	int two = 2;
	//To in c + + is used here_ String function c language can use itoa. Here, we mainly understand string splicing. sprintf is commonly used for string conversion and splicing of int type
	int size = strlen(str1) + strlen(str2) * 2 + (strlen(to_string(one).c_str())) + strlen(and1) + (strlen(to_string(two).c_str())) + 1;
	printf("\tget the len is : %d %lu\n", size, strlen("splicing test of string1 and string2"));
	char* result = (char*)malloc(size);
	if (result == NULL)
	{
		return -1;
	}
	memset(result, '\0', size); 
    //Use strcpy when testing on vs_ s
	// strcpy_s(result, size, str1);
	// strcat_s(result, size, str2);
	// strcat_s(result, size, to_string(one).c_str());
	// strcat_s(result, size, and1);
	// strcat_s(result, size, str2);
	// strcat_s(result, size, to_string(two).c_str());
	//Pay attention to the insecurity of string processing!!!
	strcpy(result,  str1);
	strcat(result,  str2);
	strcat(result,  to_string(one).c_str());
	strcat(result,  and1);
	strcat(result,  str2);
	strcat(result,  to_string(two).c_str());
	printf("\tthe result is [%lu][%s]  \n", strlen(result), result);
	if (result != NULL)
	{
		free(result);
		result = NULL;
	}
	return 0;
}

3.1.4: use sprintf (sprintf_s): the most practical and convenient

Sprintf is required when testing on vs_ s

int use_sprintf_splic_string()
{
	printf("use_sprintf_splic_string test: \n");
	const char* str1 = "splicing test of ";
	const char* str2 = "string";
	const char* and1 = " and ";
	int one = 1;
	int two = 2;
	//It is necessary to define the target string and apply for memory for it
	int size = strlen(str1) + strlen(str2) * 2 + (strlen(to_string(one).c_str())) + strlen(and1) + (strlen(to_string(two).c_str())) + 1;
	printf("\tget the len is : %d %lu\n", size, strlen("splicing test of string1 and string2"));
	char* result = (char*)malloc(size);
	if (result == NULL)
	{
		return -1;
	}
	memset(result, '\0', size);
	//Use sprintf in one step
	sprintf(result, "%s%s%d%s%s%d", str1, str2, one, and1, str2, two);
	//sprintf_s(result, size, "%s%s%d%s%s%d", str1, str2, one, and1, str2, two);
	printf("\tthe result is [%lu][%s]  \n", strlen(result), result);
	if (result != NULL)
	{
		free(result);
		result = NULL;
	}
	return 0;
}

3.2: demo of receiving data for analysis

3.2.1: directly convert char * into structure (similar to tcp/ip protocol stack processing)

Refer to 3.1.1

3.2.2: use memcpy to process by byte according to the meaning of negotiation

Refer to 3.1.2

3.2.3: first receive the length, and then receive the analysis of the data

There is a scenario for network data. You can first send the length of the actual data. When recv, you can first receive the specific byte length, and then receive the actual data to ensure the integrity of the message

//2: The specific byte represents the length, and the following data is processed = = "in fact, it is similar to the structure format, flexible array
void parse_len_and_data_networkdata()
{
printf("parse_len_and_data_networkdata test: \n");
	//In network transmission, the length + actual data format can be represented by specific bytes
	const char *send_data = "msg_type | msg_len |msg_data ...\n"; //If the actual data has special characters, the length should be transmitted in or other schemes of the structure
	//In network transmission, if you want to implement it simply and don't want to use the structure, you can use it this way 
	int send_len = strlen(send_data);
	printf("\tsend_len [%d][%s] \n",send_len, send_data);
	//You can use struct data similar to the previous function_ t{int len; char data[0];}; structure
	//I'll try this:
	char * real_send_data = (char *)malloc(send_len +4+1);
	memset(real_send_data, 0, send_len +4+1);
	memcpy(real_send_data, (char*)&send_len, sizeof(int)); //The copy length of the first four bytes can also be directly stored in the form of string to_string(send_len).c_str()
	memcpy(real_send_data +sizeof(int), send_data, send_len);
	//real_send_data is the stream of our actual send. We can first receive the first four bytes to obtain the data length, and then receive the following fields
	//First define an int, take four bytes from recv and convert it to the length represented by int
	int recv_len = *(int *)real_send_data; //First, recv takes four bytes and parses them into real data
	char* recv_data = real_send_data+4;  //This should be recv. The length read out is recv_len 
	printf("\trecv_len[%d] [%s] \n",recv_len, recv_data);
	if(real_send_data != NULL)
	{
		free(real_send_data);
		real_send_data = NULL;
	}
}

3.2.4: if the design is divided by string, analyze it

Here is just one of the schemes to cut strings.

3.2.4.1: data structure (in accordance with the cut string: "msg_type|other_type|msg_len|msg_data"):

//Simulate a complete package and return a spliced package data in a specific format
//Suppose msg_type|other_type|msg_len|msg_data format
int get_concatenate_strings(char ** result_data, int* len)
{
	// Hypothetical format msg_type|other_type|msg_len|msg_data format
	const char * data = "mytest of spilt of send data ... \n\t test";//Pay attention to the complexity of network data, and use memcpy to process it
	int msg_type = 1;
	int other_type = 2;
	int msg_len = strlen(data); //Actual subsequent data length
	//Here's an estimate of the final maximum length. 20+strlen(data) must be enough
	char *send_data = NULL;
	send_data = (char*)malloc(20+msg_len);
	memset(send_data, 0, 20+msg_len);
	sprintf(send_data, "%d|%d|%d|", msg_type, other_type, msg_len);
	
	//Pass it out by passing parameters
	*len = strlen(send_data) +msg_len;
	memcpy(send_data+strlen(send_data), data, msg_len);
	*result_data = send_data;
	printf("\t result_data is [%d][%s] \n",*len, send_data);
}

3.2.4.2: cut and parse the string according to specific characters (parse and extract the meaning of each field of "msg_type|other|type|msg_len|msg_data")

Here you can pay attention to the schemes of these library functions to realize string cutting. Testing is only one of them

char * strstr(const char *haystack, const char *needle); Function to locate the position of the substring, and then cut the string
char * strcasestr(const char *haystack, const char *needle); It has the same function as STR, but ignores the case of two parameters
char *strtok(char *str, const char *delim); Decompose the string STR into a group of strings and delim as the separator
char *strsep(char **stringp, const char *delim); Upgrade of strtok,

//Here, "|" is used to cut the target data with length len, and the parsed data is returned through the parameter result
int check_recv_data_by_spilit(const char * data, int len, char **result, const char* delim)
{
	//Cut the string directly according to '|' and verify it according to the number after cutting
	char *src = strdup(data); //After copying a copy of data, strsep will modify the original string
	char * src_free = src; 

	//msg_type|other_type|msg_len|msg_data defines the temporary pointer according to the negotiated format
	char * delim_buff[4] = {0};
	char* token = NULL;  //Character pointer returned after cutting
	int i = 0;			 //Number of cuts
	//The returned character is the matched segmented character, and the original character starts from the segmented position
	for(token = strsep(&src, delim); token!=NULL && i<4; token=strsep(&src, delim))
	{
		delim_buff[i++] = token;
		printf("\tspilt data [%d:%lu:%s] \n", i, strlen(token), token);
		printf("\t\t src:[%s] \n",src);
	}

	if(i != 4) //It must be the format of negotiation 
	{
		printf("\tvps spilit data error \n");
		free(src_free);
		return -1;
	}

	int msg_type = (int)atoi(delim_buff[0]);
	int dev_type = (int)atoi(delim_buff[1]);
	int data_len = (int)atoi(delim_buff[2]);
	char * cli_data = delim_buff[3];
	printf("\nmsg_type:%d, dev_type:%d, data_len:%d:%lu:[%s] \n", msg_type, dev_type, data_len, strlen(cli_data), cli_data);

	//Process the parsed data and pass it out with parameters
	int ret = 0;
	struct client_recv_t *result_t = NULL;
	result_t = (struct client_recv_t *)malloc(sizeof(struct client_recv_t) + data_len+1);
	if(result_t == NULL)
	{
		printf("malloc error \n");
		*result = NULL;
		ret = -1;
	}else
	{
		memset(result_t, sizeof(struct client_recv_t) + data_len+1, 0);
		result_t->msg_type = msg_type;
		result_t->dev_type = dev_type;
		result_t->data_len = data_len;
		memcpy(result_t->data, cli_data, data_len);
		
		*result = (char *)result_t;
	}
	memset(src_free, 0, len);
	free(src_free);
	src_free = NULL;
	return ret;
}

3.2.4.2: Test Code:

Then directly compile and test with gcc in linux environment, and the code is ok...

//Define a structure to save the parsed data
struct client_recv_t
{
	int msg_type;
	int dev_type;
	int data_len;
	char data[0];
};

int parse_spilt_string_and_getdata(const char * data, int len)
{
	printf("\tneed parse data is [%d][%s] \n", len, data);
	//Use a specific string to cut the string. Here is an example of "|", which can be other strings...
	//The negotiated agreement is: msg_type|other_type|msg_len|msg_data 
	//The actual data is: [47] [1 | 2 | 40 | mytest of spill of send data... \ n \ t test

	char * result_data_t = NULL; //Storing parsed data can also be other schemes. Here is just an example
	//Here, "|" is used to cut the target data with length len, and the result is obtained by passing parameters_ data_ t
	if(check_recv_data_by_spilit(data, len, &result_data_t, "|") != 0)
	{
		printf("vps parse spilit error \n");
		return -1;
	}
	//Print the parsed data
	struct client_recv_t *result_data = (struct client_recv_t *)result_data_t;
	printf("\t parse test data is [%d][%d][%d][%s] \n", result_data->msg_type, result_data->dev_type, result_data->data_len, result_data->data);
	// memset(result_data, sizeof(struct client_recv_t)+ (result_data->data_len+1), 0);
	if(result_data != NULL)
	{
		free(result_data);
		result_data = NULL;
		printf("free success \n");
	}
	return 0;
}

//Real entrance
void parse_string_spilt_data()
{
printf("parse_string_spilt_data test: \n");
	//Suppose the negotiated protocol is msg_type|other_type|msg_len|msg_data 
	//Construct a data to obtain the final sent data and data length
	char * send_data = NULL;
	int send_len = 0;
	if(get_concatenate_strings(&send_data, &send_len) < 0)
	{
		printf("\t make send_data error \n");
		return;
	}
	printf("\t last_result_data is [%d][%s] \n",send_len, send_data);
	//Assuming that this data is sent, refer to the above for the integrity of the message
	//In fact, receiving data in this format. Analyze this. Here, recv should pay attention to the integrity of the packet when receiving
	parse_spilt_string_and_getdata(send_data, send_len);
	if(send_data != NULL)
	{
		free(send_data);
		send_data = NULL;
	}
}

I started trying to accumulate some common code: Spare in your own code base

My knowledge reserve comes from here. I recommend you to understand: Linux, Nginx, ZeroMQ, MySQL, Redis, fastdfs, MongoDB, ZK, streaming media, CDN, P2P, K8S, Docker, TCP/IP, collaboration, DPDK and other technical contents, learn immediately

Topics: C network