Embedded foundation -- parsing string

Posted by Nommy on Sat, 01 Jan 2022 20:28:01 +0100

Hello, everyone. I'm surprised. Let's talk about string today. There are many scenarios for using strings, which are used in human-computer interaction and dual computer communication. For example:

  • Send instructions to MCU through serial port to perform operation or configure parameters.
  • MCU reads sensor data, and the data format is string. Generally, GPS data is in character format.
  • In some scenarios, multiple processors need to work together, such as single chip microcomputer + openmv. They need to communicate with each other and can be encoded in character format.

Manipulating strings is nothing more than two things: generating strings and parsing strings, which are often more complex. Advanced programming languages such as Java and Python come with powerful String processing libraries to provide very rich operations. The following figure shows the String class functions of Java, which are dense and numb. This is only part of it.

Relatively speaking, the standard C library provides limited functions. Well known functions may include:

  • String copy append (strcpy, strcat)
  • String lookup comparison (STR, strcmp)
  • String to number (atoi, strtol)

There are two very useful functions that may be ignored. Let me introduce them to you.

Task: analyze latitude and longitude

Let's take the analysis of RMC messages in GPS as an example. The data are as follows:

$GNRMC,122921.000,A,3204.862246,N,11845.911047,E,0.099,191.76,280521,,E,A*00

The fields of GPS are separated by commas. We need to extract longitude and latitude information, focusing on: A,3204.862246,N,11845.911047,E. A indicates that the longitude and latitude are valid, 3204.862246 is the latitude and 11845.911047 is the latitude. See the following figure for specific explanation:

Strtok split string_ r

Since the fields in GPS are separated by commas, the first thing you think of may be to use strstr or strchr to find the location of commas and deal with them again. If there is a function that can help us split, the effect is shown in the figure below, which will be very convenient for subsequent processing.

This function exists, and it is in the C standard library, that is strtok.

char *strtok(char *source, const char *delimiters);

It splits the source according to the provided delimiter set delimiters.

  • source the string to be split.
  • delimiters separator set, which can contain multiple characters. For example, "\ r\n\t" means to split with characters such as line feed and tab.
  • return returns a pointer to a substring.

This method needs to be called multiple times when splitting a string:

  • When calling for the first time, source is the string to be split and delimiters is the separator. Function returns the first substring address.
  • For subsequent calls, the source is NULL and the delimiters are delimiters. The contents of the delimiters do not need to be consistent with the previous ones. Function returns the next substring address.
  • When NULL is returned after a call, the whole split ends.

In fact, the process is not complicated. Please see the code for splitting GPS:

#define GPS_RMC "$GNRMC,122921.000,A,3204.862246,N,11845.911047,E,0.099,191.76,280521,,E,A*00"

void split_string_example(void)
{
    char buf[128];
    int buf_len;
    char *token = NULL;
    char *saveptr = NULL;
    const char *delim = ",*";

    LOG_I("test split string");

    buf_len = snprintf(buf, sizeof(buf), "%s", GPS_RMC);

    token = strtok_r(buf, delim, &saveptr);
    while (token)
    {
        LOG_D("%s", token);
        token = strtok_r(NULL, delim, &saveptr);
    }

    LOG_HEX_V(buf, buf_len, "finally, buf:");
}

As a result, I've just let go. This time, I'll put a more complete one.

Macro GPS is used in the example_ RMC to define the GPS content, and then snprintf to print it into buf?

buf_len = snprintf(buf, sizeof(buf), "%s", GPS_RMC);

This is not the author's unnecessary action, but because strtok will modify the contents of the string when it is split. The following two points need to be kept in mind:

  • strtok does not reallocate memory to store substrings. The returned substrings directly point to the corresponding positions in the string to be split. There is no memory allocation.
  • The so-called splitting is to replace the separator in the string with '\ 0'. Only in this way can you carry out subsequent operations. The end of the figure above shows the contents of the split buf, and the red boxes are '\ 0'. Therefore, the string to be split must be modifiable, must be a variable, not a constant.

The author does not use strtok, but strtok_r. There are two versions of many functions in C language, one without_ r. A belt_ r,_ R means reentrant. The concept of reentry can be written in a separate article, which will not be discussed here. strtok_r has one more parameter than strtok, which is a char * pointer to save the split state. In fact, the usage is very simple. Just define a pointer variable and pass it in. You don't need to pay attention to its value.

Optimize it

Let's look at the GPS data. If you want to extract A, 3204.862246 and 11845.911047, it's not convenient to use strtok directly.

$GNRMC,122921.000,A,3204.862246,N,11845.911047,E,0.099,191.76,280521,,E,A*00

If you use Java, the following lines of code can complete the extraction.

String gps = "$GNRMC,122921.000,A,3204.862246,N,11845.911047,E,0.099,191.76,280521,,E,A*00";
String[] sub = gps.split(",");
if (sub.length < 6) {
	System.out.println("parse fail");
} else {
	System.out.println(String.format(
			"parse succeed, valid:%s, longitude:%s, latitude:%s", 
			sub[2], sub[3], sub[5]));
}

Output results:

parse succeed, valid:A, longitude:3204.862246, latitude:11845.911047

The key to the convenience of Java is that the split function returns the split string array, which can directly extract relevant fields through subscripts.

There is no such function in C language, so we'll write one ourselves.

static int split_string(char *str, const char *delim, char *sub_ptr[], int size)
{
    char *token = NULL;
    char *saveptr = NULL;
    int idx = 0;


    token = strtok_r(str, delim, &saveptr);
    while (token && idx < size)
    {
        sub_ptr[idx++] = token;
        token = strtok_r(NULL, delim, &saveptr);
    }

    return idx;
}

split_string writes the split result to sub_ptr and returns the number of substrings. With this function, extraction is as convenient as Java.

void split_string_example2(void)
{
    char buf[128];
    char *sub_buf[20];
    int num;

    LOG_I("test split string 2");

    snprintf(buf, sizeof(buf), "%s", GPS_RMC);
    num = split_string(buf, ",", sub_buf, sizeof(sub_buf));

    if (num < 7)
    {
        LOG_E("fail");
        return;
    }

    LOG_D("succeed, valid:%s, latitude:%s, longitude:%s", sub_buf[2], sub_buf[3], sub_buf[5]);

}

Use strtok or split_string just extracts the target string. If you want to get the longitude and latitude value, you need to convert it into a floating point number. You can use atof function. In fact, there is a simpler way. Let's continue tomorrow.

For the complete example code in this article, see the demo project created by the author based on stm32f407:

Address: git@gitee.com:wenbodong/mcu_demo.git
 Example: examples/05_string/example.c
 Need to open when using examples/examples.h Medium EXAMPLE_SHOW_STRING. 

Topics: Single-Chip Microcomputer string