Catalogue of series articles
1. Implement crawler with C + +!
preface
If you are a C/C + + programmer, you should be no stranger to VS. it can be said to be a sharp weapon in the hands of C/C + + programs
However, if you study deeply, you will find that most windows API s are divided into wide bytes and narrow bytes, such as the common MessageboxA and MessageBoxW functions. At this time, there will be many problems, and the most common is garbled code
It should be noted that the underlying functions of WIndows use wide bytes. Even if you use char, char will be converted into wchar at the bottom when the program is actually executed_ t. This means that using narrow bytes is less efficient than using wide bytes
You also need to know wchar_t supports multiple national languages, while char only supports national languages
1, What are wide bytes? What are narrow bytes?
The best way to know the width and width of bytes is to experiment
It can be seen that the most direct impact is the size. char only occupies one byte, while wchar_t takes up two bytes, and L needs to be added before the string to indicate that it is a wide byte. Insert the code slice here
For example, are there many Chinese characters that can be used normally? Many problems need to be encountered and solved by ourselves before they can be our own things
2, Conversion method between wide and narrow bytes
1. Convert to Windows API
Header file:
#include<Windows.h>
Functions used
Narrow byte to wide byte:
int MultiByteToWideChar( UINT CodePage, //The code page to be converted is generally filled in CP directly_ ACP, which indicates the code page used by the current system DWORD dwFlags, //Conversion sign, just fill in 0 directly LPCCH lpMultiByteStr, //Narrow byte string to convert int cbMultiByte, //The length of a narrow byte string, in bytes LPWSTR lpWideCharStr, //Wide character buffer for storing converted characters int cchWideChar //The size of the wide character buffer );
Wide byte to narrow byte:
int WideCharToMultiByte( UINT CodePage,//The code page to be converted is generally filled in CP directly_ ACP, which indicates the code page used by the current system DWORD dwFlags,//Conversion sign, just fill in 0 directly LPCWCH lpWideCharStr,//Wide byte string to convert int cchWideChar, //The length of a wide byte string, in characters LPSTR lpMultiByteStr, //Narrow character buffer for storing converted characters int cbMultiByte, //The size of the wide character buffer LPCCH lpDefaultChar, //If the character cannot be converted, it is filled in with this character. Generally, 0 is filled in. It is OK by default LPBOOL lpUsedDefaultChar //If there are characters that cannot be converted, this parameter is set to true and NULL is filled in by default );
Generally, a wide character is twice the length of a narrow character byte, but there may also be unexpected situations, or if you don't want to calculate how much buffer you need, you can call this function twice, return the required size for the first time, and convert it for the second time
The following are the two functions I encapsulated. They can be used directly, but they need their own delete memory. They can be replaced by wstring and string
Narrow byte to wide byte:
//str: narrow string to convert //len: accept the length of wide characters after successful conversion. You can directly fill in NULL instead of receiving wchar_t* AtoW(const char* str, int* len) { int wcLen = MultiByteToWideChar(CP_ACP, 0, str, -1, NULL, 0); wchar_t* newBuf = new wchar_t[wcLen + 1]{}; MultiByteToWideChar(CP_ACP, 0, str, -1, newBuf, wcLen); if (len != NULL) { *len = wcLen; } return newBuf; }
Wide byte to narrow byte
//str: wide string to convert //len: accept the length of narrow characters after successful conversion. You can directly fill in NULL instead of receiving char* WtoA(const wchar_t* str, int* len) { int cLen = WideCharToMultiByte(CP_ACP, 0, str, -1, NULL, 0, 0, NULL); char* newBuf = new char[cLen + 1]{}; WideCharToMultiByte(CP_ACP, 0, str, -1, newBuf, cLen, 0, NULL); if (len != NULL) { *len = cLen; } return newBuf; }
2.C/C + + library function conversion
Header file used:
#Include < cstdlib > / / including conversion functions #Include < locale > / / including the function of setting the region
Functions used:
Set the region. When trying to convert Chinese, it needs to be set, otherwise it is garbled
char* setlocale( int _Category, //To set the influence range of this function, generally fill in LC directly_ All, i.e. all impact char const* _Locale //Generally fill in the blank, that is, use the local regional information );
Standard narrow to wide:
size_t mbstowcs( wchar_t _Dest, //Storage place after conversion const char * _Source, //Content to convert size_t _MaxCount //The size of the storage place after conversion, calculated in the number of characters )
use:
#define _CRT_SECURE_NO_WARNINGS / / macros must be defined, otherwise VS will report an error #include<iostream> #include<cstdlib> #include<locale> using namespace std; int main() { setlocale(LC_ALL, ""); //Set local region information Otherwise, there will be garbled code when converting Chinese wchar_t buf[0xFF]; mbstowcs(buf, "Ha ha ha ha", 0xFF); }
Standard width to narrow:
size_t wcstombs( char*_Dest, //Storage place after conversion const wchar_t* _Source, //Content to convert size_t _MaxCount //The size of the storage place after conversion, calculated in the number of characters )
use:
#define _CRT_SECURE_NO_WARNINGS / / macros must be defined, otherwise VS will report an error #include<iostream> #include<cstdlib> #include<locale> using namespace std; int main(){ setlocale(LC_ALL,""); //Set local region information Otherwise, there will be garbled code when converting Chinese char buf[0xFF]; wcstombs(buf,L"Ha ha ha ha ha ha ha",sizeof(buf)); }
Safety function narrow to wide:
errno_t mbstowcs_s( size_t* _PtNumOfCharConverted, //Number of characters successfully converted wchar_t* _DstBuf, //Accept successfully converted characters size_t _SizeInWords, //_ DesBuf buffer size, in characters char const* _SrcBuf, //Characters to convert size_t _MaxCount //Maximum number of characters to convert );
use:
#include<iostream> #include<cstdlib> using namespace std; int main() { wchar_t buf[0xFF]; mbstowcs_s(NULL,buf,0xFF, "Ha ha ha ha", 0xFF); }
Safety function width turns narrow:
errno_t wcstombs_s( size_t* _PtNumOfCharConverted, //Number of characters successfully converted wchar_t* _DstBuf, //Accept successfully converted characters size_t _SizeInWords, //_ DesBuf buffer size, in characters char const* _SrcBuf, //Characters to convert size_t _MaxCount //Maximum number of characters to convert );
use:
#include<iostream> #include<cstdlib> #include<locale> using namespace std; int main() { setlocale(LC_ALL,""); char buf[0xFF]; wcstombs_s(NULL, buf, 0xFF, L"Ha ha ha ha ha ha ha", sizeof(buf)); printf("%s",buf); }
You may see that sometimes I use setlocal and sometimes I don't. this can be determined according to the specific situation. If Chinese cannot be converted, we should consider using this function
Moreover, I didn't receive the number of converted characters, that is, the first parameter. If you need to accurately accept the number of successfully converted characters, you must use the setlocal function
Maybe you've seen it_ wcstombs_s_l and other functions, this function also needs_ create_locale and_ free_ The locale function is used together. Considering that it is too troublesome, it is not as good as the above conversion methods, so it will not be explained. If you have a heart song, you can check the description on the official website. Here is the link
Function description Set local description
III Solve the problem that the VS console cannot output wide characters
Method 1: directly use printf function:
printf("%ls",buf); //Output wide characters with% ls
Method 2: use setlocal and wcout
#include<iostream> #include<locale> using namespace std; int main() { setlocale(LC_ALL,""); wcout << L"Ha ha ha"; }
Method 3: use WriteConsoleW function
#include<Windows.h> int main() { wchar_t buf[]=L"Ha ha ha ha ha ha ha ha"; WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE),buf,sizeof(buf)/2, NULL, 0); }