C + + wide and narrow character conversion and output

Posted by phpcoding2 on Mon, 28 Feb 2022 02:26:01 +0100

Catalogue of series articles

1. Implement crawler with C + +!

preface

If you are a C/C + + programmer, you should be no stranger to VS. it can be said to be a sharp weapon in the hands of C/C + + programs

However, if you study deeply, you will find that most windows API s are divided into wide bytes and narrow bytes, such as the common MessageboxA and MessageBoxW functions. At this time, there will be many problems, and the most common is garbled code

It should be noted that the underlying functions of WIndows use wide bytes. Even if you use char, char will be converted into wchar at the bottom when the program is actually executed_ t. This means that using narrow bytes is less efficient than using wide bytes

You also need to know wchar_t supports multiple national languages, while char only supports national languages

1, What are wide bytes? What are narrow bytes?

The best way to know the width and width of bytes is to experiment

It can be seen that the most direct impact is the size. char only occupies one byte, while wchar_t takes up two bytes, and L needs to be added before the string to indicate that it is a wide byte. Insert the code slice here

For example, are there many Chinese characters that can be used normally? Many problems need to be encountered and solved by ourselves before they can be our own things

2, Conversion method between wide and narrow bytes

1. Convert to Windows API

Header file:

#include<Windows.h>

Functions used

Narrow byte to wide byte:

int MultiByteToWideChar(
UINT    CodePage,  //The code page to be converted is generally filled in CP directly_ ACP, which indicates the code page used by the current system
DWORD   dwFlags, //Conversion sign, just fill in 0 directly
LPCCH lpMultiByteStr, //Narrow byte string to convert
int   cbMultiByte, //The length of a narrow byte string, in bytes
LPWSTR  lpWideCharStr, //Wide character buffer for storing converted characters
int    cchWideChar //The size of the wide character buffer
);

Wide byte to narrow byte:

int WideCharToMultiByte(
UINT CodePage,//The code page to be converted is generally filled in CP directly_ ACP, which indicates the code page used by the current system
 DWORD dwFlags,//Conversion sign, just fill in 0 directly
LPCWCH lpWideCharStr,//Wide byte string to convert
int cchWideChar, //The length of a wide byte string, in characters
LPSTR lpMultiByteStr, //Narrow character buffer for storing converted characters
 int cbMultiByte, //The size of the wide character buffer
LPCCH lpDefaultChar, //If the character cannot be converted, it is filled in with this character. Generally, 0 is filled in. It is OK by default
LPBOOL lpUsedDefaultChar //If there are characters that cannot be converted, this parameter is set to true and NULL is filled in by default
);

Generally, a wide character is twice the length of a narrow character byte, but there may also be unexpected situations, or if you don't want to calculate how much buffer you need, you can call this function twice, return the required size for the first time, and convert it for the second time

The following are the two functions I encapsulated. They can be used directly, but they need their own delete memory. They can be replaced by wstring and string

Narrow byte to wide byte:

//str: narrow string to convert
//len: accept the length of wide characters after successful conversion. You can directly fill in NULL instead of receiving
wchar_t* AtoW(const char* str, int* len)
{
	int wcLen = MultiByteToWideChar(CP_ACP, 0, str, -1, NULL, 0);
	wchar_t* newBuf = new wchar_t[wcLen + 1]{};
	MultiByteToWideChar(CP_ACP, 0, str, -1, newBuf, wcLen);
	if (len != NULL) {
		*len = wcLen;
	}
	return newBuf;
}

Wide byte to narrow byte

//str: wide string to convert
//len: accept the length of narrow characters after successful conversion. You can directly fill in NULL instead of receiving
char* WtoA(const wchar_t* str, int* len)
{
	int cLen = WideCharToMultiByte(CP_ACP, 0, str, -1, NULL, 0, 0, NULL);
	char* newBuf = new char[cLen + 1]{};
	WideCharToMultiByte(CP_ACP, 0, str, -1, newBuf, cLen, 0, NULL);
	if (len != NULL) {
		*len = cLen;
	}
	return newBuf;
}

2.C/C + + library function conversion

Header file used:

#Include < cstdlib > / / including conversion functions
#Include < locale > / / including the function of setting the region

Functions used:

Set the region. When trying to convert Chinese, it needs to be set, otherwise it is garbled

char* setlocale(
int  _Category, //To set the influence range of this function, generally fill in LC directly_ All, i.e. all impact
char const* _Locale //Generally fill in the blank, that is, use the local regional information
);

Standard narrow to wide:

size_t mbstowcs(
wchar_t    _Dest, //Storage place after conversion
const char * _Source, //Content to convert
size_t   _MaxCount //The size of the storage place after conversion, calculated in the number of characters
)

use:

#define _CRT_SECURE_NO_WARNINGS / / macros must be defined, otherwise VS will report an error
#include<iostream>
#include<cstdlib>
#include<locale>
using namespace std;
int main() {
	setlocale(LC_ALL, ""); //Set local region information Otherwise, there will be garbled code when converting Chinese
	wchar_t buf[0xFF];
	mbstowcs(buf, "Ha ha ha ha", 0xFF);
}

Standard width to narrow:

size_t wcstombs(
char*_Dest, //Storage place after conversion
const wchar_t* _Source, //Content to convert
size_t   _MaxCount //The size of the storage place after conversion, calculated in the number of characters
)

use:

#define _CRT_SECURE_NO_WARNINGS / / macros must be defined, otherwise VS will report an error
#include<iostream>
#include<cstdlib>
#include<locale>
using namespace std;
int main(){
	setlocale(LC_ALL,""); //Set local region information Otherwise, there will be garbled code when converting Chinese
	char buf[0xFF];
	wcstombs(buf,L"Ha ha ha ha ha ha ha",sizeof(buf));
}

Safety function narrow to wide:

errno_t mbstowcs_s(
size_t* _PtNumOfCharConverted, //Number of characters successfully converted
 wchar_t*    _DstBuf, //Accept successfully converted characters
size_t  _SizeInWords, //_ DesBuf buffer size, in characters
 char const* _SrcBuf, //Characters to convert
 size_t      _MaxCount //Maximum number of characters to convert
    );

use:

#include<iostream>
#include<cstdlib>
using namespace std;
int main() {
	wchar_t buf[0xFF];
	mbstowcs_s(NULL,buf,0xFF, "Ha ha ha ha", 0xFF);
}

Safety function width turns narrow:

errno_t wcstombs_s(
size_t* _PtNumOfCharConverted, //Number of characters successfully converted
 wchar_t*    _DstBuf, //Accept successfully converted characters
size_t  _SizeInWords, //_ DesBuf buffer size, in characters
 char const* _SrcBuf, //Characters to convert
 size_t      _MaxCount //Maximum number of characters to convert
    );

use:

#include<iostream>
#include<cstdlib>
#include<locale>
using namespace std;
int main() {
	setlocale(LC_ALL,"");
	char buf[0xFF];
	wcstombs_s(NULL, buf, 0xFF, L"Ha ha ha ha ha ha ha", sizeof(buf));
	printf("%s",buf);
}

You may see that sometimes I use setlocal and sometimes I don't. this can be determined according to the specific situation. If Chinese cannot be converted, we should consider using this function

Moreover, I didn't receive the number of converted characters, that is, the first parameter. If you need to accurately accept the number of successfully converted characters, you must use the setlocal function

Maybe you've seen it_ wcstombs_s_l and other functions, this function also needs_ create_locale and_ free_ The locale function is used together. Considering that it is too troublesome, it is not as good as the above conversion methods, so it will not be explained. If you have a heart song, you can check the description on the official website. Here is the link
Function description Set local description

III Solve the problem that the VS console cannot output wide characters

Method 1: directly use printf function:

printf("%ls",buf); //Output wide characters with% ls

Method 2: use setlocal and wcout

#include<iostream>
#include<locale>
using namespace std;
int main() {
	setlocale(LC_ALL,"");
	wcout << L"Ha ha ha";
}

Method 3: use WriteConsoleW function

#include<Windows.h>
int main() {
	wchar_t buf[]=L"Ha ha ha ha ha ha ha ha";
	WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE),buf,sizeof(buf)/2, NULL, 0);
}

Topics: C++ Back-end