C language: deep understanding of character functions and string functions

Posted by KMC1499 on Sat, 12 Feb 2022 14:40:24 +0100

References for this article are from cplusplus:

cplusplus.com - The C++ Resources Networkhttp://www.cplusplus.com/

 

catalogue

1, strstr string lookup

2, strtok string extraction

3, strerror error message report

4, Character classification function

5, memcpy memory copy

6, memmove memory copy

7, memcmp memory comparison

8, memset memory settings

1, strstr string lookup

Information about STR:

The following is an example of strstr searching string to lead to the simulated implementation of strstr:

//strstr string lookup
#include <stdio.h>
#include <string.h>
int main()
{
	char arr1[] = "abbbcdef";
	char arr2[] = "bbc";
	char* ret = strstr(arr1, arr2);
	if (ret == NULL)
	{
		printf("can't find\n");
	}
	else
	{
		printf("%s\n", ret);
	}
	return 0;
}

Simulate the implementation of STR:

#include <stdio.h>
#include <string.h>
#include <assert.h>

char* my_strstr(const char* arr1, const char* arr2)
{
	const char* s1 = arr1;
	const char* s2 = arr2;
	const char* s0 = arr1;
	assert(arr1 && arr2);
	if (*arr2 == '\0')
	{
		return (char*)arr1;
	}
	while (*s0)
	{
		s1 = s0;
		s2 = arr2;
		while (*s1 && *s2 && *s1 == *s2)
		{
			s1++;
			s2++;
		}
		if (*s2 == '\0')
		{
			return (char*)s0;
		}
		s0++;
	}
	return NULL;
}

int main()
{
	char arr1[] = "abbbcdef";
	char arr2[] = "bbc";
	char* ret = my_strstr(arr1, arr2);
	if (ret == NULL)
	{
		printf("can't find\n");
	}
	else
	{
		printf("%s", ret);
	}
	return 0;
}

Here is the BF algorithm, also known as the violence algorithm, which is an ordinary pattern matching algorithm. The idea of BF algorithm is to match the first character of the target string with the first character of the pattern string. If they are equal, continue to compare the second character of the target string with the second character of the pattern string; If they are not equal, compare the second character of the target string with the first character of the pattern string, and compare them successively until the final matching result is obtained. The efficiency of BF algorithm will be relatively low. The following will use the form of diagram to understand the efficiency of this simulation_ strstr:

Of course, it is not difficult for us to see two laws through this conventional method: when the matches are not equal, i will return to the place where the subscript of the main string is i-j+1, and j will return to the place where the subscript of the sub string is 0; When found, you can directly return i to the place where the subscript of the main string is i-j, and then return this address. After the str code is modified twice, we can simplify the str code by combining the str code with the subscript Code:

#include <stdio.h>
#include <string.h>
#include <assert.h>

char* my_strstr(const char* str, const char* sub)
{
	assert(str && sub);
	if (*sub == '\0')
	{
		return (char*)str;
	}
	int lenStr = strlen(str);
	int lenSub = strlen(sub);
	int i = 0;
	int j = 0;
	while (i < lenStr && j < lenSub)
	{
		if (str[i] == sub[j])
		{
			i++;
			j++;
		}
		else
		{
			i = i - j + 1;
			j = 0;
		}
	}
	if (j >= lenSub)
	{
		return (char*)(str + i - j);
	}
	return NULL;
}

int main()
{
	char arr1[] = "abbbcdef";
	char arr2[] = "bbc";
	char* ret = my_strstr(arr1, arr2);
	if (ret == NULL)
	{
		printf("can't find\n");
	}
	else
	{
		printf("%s", ret);
	}
	return 0;
}

Because the BF algorithm needs to be found one by one, and the time complexity is the product of the respective lengths of the main string and sub string, the cost is too high. Therefore, a separate article on KMP algorithm will be written later, and the efficiency will be greatly improved.

2, strtok string extraction

Information about strtok:

strtok is a string function that is difficult to understand. Some points can be summarized from some data:

1. The delimiters parameter is a string that defines the set of characters used as separators

2. The first parameter specifies a string that contains 0 or more tags separated by one or more separators in the delimiters string

3. strtok function finds the next tag in str, ends it with '\ 0', and returns a pointer to this tag (Note: strtok function will change the string to be operated, so the string segmented by strtok function is generally the content of temporary copy and can be modified)

4. The first parameter of strtok function is not NULL. The function will start at the position saved in the same string to find the next tag

5. If there are no more tags in the string, a NULL pointer is returned

In a word, the core of strtok function is: when strtok function finds the first tag, the first parameter of the function is not NULL; When strtok function finds a non first parameter, the first parameter of the function is NULL.

Next, we will introduce the strtok function and its usage through an example of implementing strtok:

//strtok string extraction
#include <stdio.h>
#include <string.h>
int main()
{
	const char* p = "@.#";
	char arr[] = "abcd.ef#gh@cxz";
	char buf[50] = { 0 };
	strcpy(buf, arr);
	char* str = NULL;
	for (str = strtok(buf, p); str != NULL; str = strtok(NULL, p))
	{
		printf("%s\n", str);
	}
	return 0;
}

Points to note in this example code:

1. The original string will be modified when calling strtok function, so arr should first copy a buf to modify it in advance. Here we use the strcpy string copy function mentioned last time

2. Because we need to call strtok function every time we find a tag, which will be very troublesome in actual operation, we use a for loop to implement it here. It's very ingenious here, because when strtok function finds the first tag, the first parameter of the function is not NULL, while when strtok function finds the non first parameter, the first parameter of the function is NULL. It's not difficult to find here. Only find out that the first parameter of strtok function called by the first tag is buf, and the first parameter of other subsequent calls is NULL, Therefore, you can define an initial pointer str and assign it to strtok(buf, p). The judgment condition is that it is not equal to NULL, and then it is strtok(NULL, p) every time until the strtok function returns a NULL pointer (that is, str is equal to NULL).

3, strerror error message report

Some error messages are specified in C language. These error messages have their corresponding error codes. Different error codes will be returned when different errors occur in the program, and strerror function can translate the error code into error information, which is convenient to quickly see the cause of code error.

Another point is that when an error occurs when the library function is used, the global error variable errno will be set as the error code generated by this execution of the library function, and errno is a global variable provided by C language, which can be used directly and placed in errno H file, so when we use the global error variable errno in the strerror function to print the error message, we should index the header file < errno h>.

4, Character classification function

functionReturns true if the parameter meets the following conditions
iscntrlAny control character
isspaceWhite space characters: space ',' page feed '\ f', line feed '\ n', carriage return '\ r', horizontal tab '\ t' or vertical tab '\ v'
isdigitDecimal digits 0 ~ 9
isxdigitHexadecimal numbers, including decimal numbers, lowercase letters a ~ F and uppercase letters a ~ F
islowerSmall letter a~z
isupperCapital letters A~Z
isalphaLetters a ~ Z or a ~ Z
isalnumLetters or numbers, a ~ Z, a ~ Z, 0 ~ 9
ispunctPunctuation mark, any graphic character not belonging to numbers or letters (printable)
isgraphAny graphic character
isprintAny printable character, including graphic characters and white space characters

The above character classification functions are relatively simple and commonly used. Remember!

5, memcpy memory copy

Information about memcpy:

The noteworthy points here are:

1. The function memcpy starts from the location of soure and copies the data of num bytes backward to the memory location of destination

2. This function will not stop when it encounters' \ 0 '

3. If there is any overlap between source and destination, the copied result is undefined

4. It is necessary to convert its type into void * type when transmitting address. The specific reason is mentioned in the previous article on qsort sorting algorithm. There is no more description here. If you don't understand it, you can read this article first:

C language: deep understanding of sorting algorithm_ Faith_cxz blog CSDN blog C language: in depth understanding of sorting algorithmhttps://blog.csdn.net/Faith_cxz/article/details/122607698?spm=1001.2014.3001.5502 Next, a memcpy memory copy example is used to lead to the simulation implementation of memcpy:

//memcpy memory copy
#include <stdio.h>
#include <string.h>
int main()
{
	int arr1[] = { 1,2,3,4,5,6,7,8,9,10 };
	int arr2[10] = { 0 };
	memcpy(arr2, arr1 + 5, 5);
	for (int i = 0; i < 2; i++)
	{
		printf("%d ", arr2[i]);
	}
	return 0;
}

Analog implementation memcpy:

#include <stdio.h>
#include <string.h>
#include <assert.h>

void* my_memcpy(void* dest, const void* src, size_t num)
{
	void* ret = dest;
	assert(dest && src);
	while (num--)
	{
		*(char*)dest = *(char*)src;
		dest = (char*)dest + 1;
		src = (char*)src + 1;
	}
	return ret;
}

int main()
{
	int arr1[] = { 1,2,3,4,5,6,7,8,9,10 };
	int arr2[10] = { 0 };
	my_memcpy(arr2, arr1 + 5, 5*sizeof(arr1[0]));
	for (int i = 0; i < 5; i++)
	{
		printf("%d ", arr2[i]);
	}
	return 0;
}

6, memmove memory copy

The information about memmove found is basically similar to that of memcpy. The only difference is that memcpy can only copy non overlapping strings; Memmove can copy overlapping strings, which is required by C language. Therefore, memcpy is a subset of memmove.

Next, the simulation implementation of memmove is introduced through an example of memmove memory copy:

//memmove memory copy
#include <stdio.h>
#include <string.h>
int main()
{
	int arr[] = { 1,2,3,4,5,6,7,8,9,10 };
	memmove(arr, arr + 2, 5 * sizeof(arr[0]));
	for (int i = 0; i < 10; i++)
	{
		printf("%d ", arr[i]);
	}
	return 0;
}

Simulate memmove:

#include <stdio.h>
#include <string.h>
#include <assert.h>

void* my_memmove(void* dest, const void* src, size_t num)
{
	void* ret = dest;
	assert(dest && src);
	if (dest < src)
	{
		while (num--)
		{
			*(char*)dest = *(char*)src;
			dest = (char*)dest + 1;
			src = (char*)src + 1;
		}
	}
	else
	{
		while (num--)
		{
			*((char*)dest + num) = *((char*)src + num);
		}
	}
	return ret;
}

int main()
{
	int arr[] = { 1,2,3,4,5,6,7,8,9,10 };
	my_memmove(arr + 2, arr, 5 * sizeof(arr[0]));
	for (int i = 0; i < 10; i++)
	{
		printf("%d ", arr[i]);
	}
	return 0;
}

Next, an illustration is given to illustrate the idea of simulating memmove:

7, memcmp memory comparison

Information about memcmp:

Next, the simulation implementation of memcmp is introduced through an example of memcmp:

//memcmp memory comparison
#include <stdio.h>
#include <string.h>
int main()
{
	int arr1[] = { 1,2,3,4,5 };
	int arr2[] = { 1,2,3,4,6 };
	int ret = memcmp(arr1, arr2, 16);
	if (ret > 0)
	{
		printf("arr1>arr2");
	}
	else if (ret < 0)
	{
		printf("arr1<arr2");
	}
	else
	{
		printf("arr1=arr2");
	}
	return 0;
}

Analog implementation of memcmp:

#include <stdio.h>
#include <string.h>
#include <assert.h>

int my_memcmp(const void* ptr1, const void* ptr2, size_t num)
{
	assert(ptr1 && ptr2);
	while (num--)
	{
		if (*(char*)ptr1 == *(char*)ptr2)
		{
			if (num == 0)
			{
				return 0;
			}
			ptr1 = (char*)ptr1 + 1;
			ptr2 = (char*)ptr2 + 1;
		}
		else
		{
			return *(char*)ptr1 - *(char*)ptr2;
		}
	}
}

int main()
{
	int arr1[] = { 1,2,4 };
	int arr2[] = { 1,2,3 };
	int ret = my_memcmp(arr1, arr2, 9);
	if (ret > 0)
	{
		printf("arr1>arr2");
	}
	else if (ret < 0)
	{
		printf("arr1<arr2");
	}
	else
	{
		printf("arr1=arr2");
	}
	return 0;
}

This code simulating the implementation of memcmp is worth noting that when we judge to return 0, we can no longer judge that a pointer points to '\ 0' and ends and returns 0 as before, because memcmp compares memory in bytes. It is very likely that a byte value is' \ 0 ', which may lead to an early end and an error in the running result, Here we need to use the num defined by us. When num is reduced to 0, it means that all comparisons are completed, and then 0 can be returned.

8, memset memory settings

Information about memset:

An example of memory setting using memset:

//memset memory settings
#include <stdio.h>
#include <string.h>
int main()
{
	int arr[10] = { 1,2,3,4,5,6,7,8,9,10 };
	memset(arr, 0, 10);
	for (int i = 0; i < 10; i++)
	{
		printf("%d ", arr[i]);
	}
	return 0;
}

 

Topics: C Back-end