boost I string and text processing

Posted by suspect_device on Sat, 30 Oct 2021 06:35:08 +0200

1, lexical_cast

         lexical_ The cast library can convert strings to integers \ floating point numbers.

         lexical_ The standard form of cast has two template parameters   Template < typename Target, typename Source >, Target needs to be manually specified. It is the converted Target type, usually numeric type or std:: string; The second parameter Source does not have to be written because it can be derived from the function parameters.

         When converting a string to a number, the string can only have numbers and decimal points, and letters or other non numeric characters (except e\E representing an index) cannot appear.

         lexical_cast can convert 0 and 1 of integer or string to bool type. Note that the true\false literal cannot be used.

         When lexical_cast throws an exception when it cannot perform the conversion operation   bad_lexical_cast, which is STD:: bad_ Derived class of cast. We can use the try/catch block to protect the code. Of course, we can use try_lexical_convert() is a safe conversion literal to avoid throwing exceptions. It uses the bool return value to indicate whether the conversion is successful.

         lexical_cast uses the stream operation of standard library internally. Therefore, it has the following requirements for its conversion objects. Standard containers and other user-defined types must meet these conditions, otherwise they cannot use lexical_cast .

  • The conversion starting point object is flowable output, that is, opreator < <.
  • The conversion end object is flowable, that is, opreator > > is defined.
  • The conversion end object must be constructed by default and copied.
#include <boost/lexical_cast.hpp>
void TestLexical_Cast()
{
    std::string str = boost::lexical_cast<std::string>(0x96);
    std::cout << str << std::endl;                  //150

    std::string str1 = boost::lexical_cast<std::string>(30);
    std::cout << str1 << std::endl;                 //30

    float pi = boost::lexical_cast<float>("3.141592653");
    std::cout << pi << std::endl;                   //3.14159

    //Only integer, string 0, 1 to bool are supported
    bool bo = boost::lexical_cast<bool>("0");
    std::cout << bo << std::endl;

    try {
        //Error, the string to be converted can only have numbers, decimal point and exponential e/E, and cannot have other characters
        int num = boost::lexical_cast<int>("0x96");
        std::cout << num << std::endl;
    }
    catch (boost::bad_lexical_cast e)
    {
        std::cout << e.what() << std::endl;         //bad lexical cast: source type value could not be interpreted as target
    }

    //Use try_lexical_convert safe conversion literal to avoid throwing exceptions
    int num;
    bool is_success = boost::conversion::try_lexical_convert("0x96", num);
    std::cout << "is_success:" << is_success << ",num:" << num << std::endl;//0,96
}

Comparison with C language and C + +

         The functions of atoi () and atof() in C language are asymmetric. They can only convert strings into numeric values, and there is no conversion from numeric values to strings.

         c++11 enhances the interoperability between strings and numbers, and provides stoX () and to_ The string () function implements the conversion between a string and a number. It does not need to write template parameters, and allows non numeric characters in the string -- they ignore the starting spaces until they encounter characters that cannot be converted. However, if the string does not start with a space, a number, or is outside the range of numeric types, these functions throw std::invalid_argument or std::out_of_range exception.

    int a = 10;
    std::string a_str = std::to_string(a);
    std::cout << a_str << std::endl;                //10
    int b = std::stoi("12ss");
    std::cout << b << std::endl;                    //12

2, format

         The classic printf () in C language uses the variable parameters in C language and lacks type safety check (fast), but its syntax is simple and efficient, and is widely accepted and used, which has a far-reaching impact.

         The boost.format library "discards" printf (), which implements a similar formatting object. It can format parameters into a string, and this operation is completely type safe (slow). Format mimics the stream operator "< <"   , Overloaded the binary operator operator% as the parameter input, it can also concatenate any number of parameters. It has been incorporated into the C ++20 standard.

1.format class

         Format is not a real class, but a typedef, and its real implementation is basic_format .

         The member function str() returns the formatted string inside the format object (not empty). If you do not get all the parameters required by the formatted string, an exception will be thrown. The format library also provides a free function str() with the same name, which is located in the boost namespace and returns the formatted string inside the format object.

         The member function size() can obtain the length of the formatted string. If you do not get all the parameters required by the formatted string, an exception will be thrown.

         The member function parse() empties the internal cache of the format object and uses a new format string instead. If you just want to empty the cache, you can use clear (), which restores the format object to the initialized state. Calling str () or size() after these two functions are executed will throw an exception because no formatting parameters are entered at this time.

2. Formatting syntax

         Format basically inherits the formatting syntax of printf, such as%05d,% - 8.3f, etc. In addition to the classic printf format, format also adds a new format:

  • %|spec | add vertical bar segmentation to better distinguish formatting options from ordinary characters
  • %N% marks the nth parameter, which is equivalent to a placeholder without any other formatting options
#include <boost/format.hpp>
void TestFormat()
{
    std::cout << boost::format("%s:%d+%d=%d\n") % "sum" % 1 % 2 % (1 + 2); //sum:1+2=3
    //Compare printf
    printf("%s:%d+%d=%d\n", "sum", 1, 2, 1 + 2);                           //sum:1+2=3

    //%N% marks the nth parameter, which is equivalent to a placeholder without any other formatting options
    boost::format fmt("(%1%+%2%)*%3%=%4%");                                //Create a format object in advance
    fmt % 4 % 6 % 2;                                                       //Enter the formatted parameters several times
    fmt % ((4 + 6) * 2);
    std::cout << "format string :" << fmt.str() << ",String length:" << fmt.size();//Format string: (4 + 6) * 2 = 20, string length: 10 / / (4 + 6) * 2 = 20
    fmt.clear();
    fmt % 3 % 6 % 4 % ((3 + 6) * 4);
    std::cout << "format string :" << fmt.str() << ",String length:" << fmt.size();//Format string: (3 + 6) * 4 = 36, string length: 10
    //Compare printf
    printf("(%d+%d)*%d=%d", 4, 6, 2, (4 + 6) * 2);                         //(4+6)*2=20

    //%|spec | add vertical bar segmentation to better distinguish formatting options from ordinary characters
    boost::format fmt1("%|05d|-%|-8.3f|-%|10s|-%|05X|");
    fmt1 % 10 % 3.14%"hello" % 150;
    std::cout << fmt1.str() << std::endl;//00010-3.140   -     hello-00096
}

3, string_ref

1. Background

         The basic tool for processing strings in C + + is the standard string std::string, but the cost of constructing an std::string is high because it must fully hold the content of the string. In extreme cases, it will cost a high memory copy cost and affect the program efficiency. Using const std::string & can avoid some problems, but it can't do anything when dealing with C strings and extracting substrings. In a word, std::string seems a little "heavy". We need a lighter string tool - boost::string_ref.

2.boost::string_ref

         boost::string_ref only holds the reference of the string without the cost of memory copy, so it runs efficiently and is a better const STD:: String &. It has been incorporated into the C ++17 standard (but renamed string_view).

         string_ The ref library defines basic_string_ref, which does not copy strings, so it does not allocate memory, and only uses two member variables ptr_ And len_   Mark the starting position and length of the string, so as to realize the representation of the string. basic_string_ref is a "constant view" of a string. Most member functions are modified by const. We can only observe the string like const STD:: String & and cannot modify the string.

         Due to string_ The interface of ref is exactly the same as that of string, so an important use of ref is to replace const STD:: String & type as function parameter or return value (the referenced string object must be guaranteed to be available, and long-term holding or delayed use shall be avoided as far as possible. When it is really necessary to hold or modify the string, you can call the member function to_string() Obtain a copy to ensure security.) it can completely avoid the cost of string copy and improve the processing efficiency of string.

3.remove_prefix() and remove_suffix( )

         Although string_ref cannot directly change the original string, but it can use remove_prefix() and remove_ The suffix () functions adjust the string_ref internal string pointer and length to achieve the purpose of changing string reference - but the original string has not been modified.

#include <boost/utility/string_ref.hpp>
void TestStringRef()
{
    const char* ch = "Study C++ in library";
    std::string str(ch);                            //Standard string with copy cost

    boost::string_ref str_ref(ch);                  //Zero-copy 
    if (str_ref == str)
    {
        std::cout << "Two strings are equal" << std::endl;   //Two strings are equal
    }
    std::cout << "First character:" << str_ref.front() << std::endl;              //First character: S
    std::cout << "Last character:" << str_ref[str_ref.length()-1] << std::endl;//Last character: y
    int index = str_ref.find('+');
    std::cout << "+Indexes" << index << std::endl;     //+Index 7

    boost::string_ref substr = str_ref.substr(6, 3);
    std::cout << substr << std::endl;              //C++

    std::string str2 = str_ref.to_string();
    if (str2 == str&& str_ref==str2)
    {
        std::cout << "The three strings are equal" << std::endl;  //The three strings are equal
    }

    str_ref.remove_prefix(6);
    std::cout <<"Remove the first 6 characters:"<< str_ref << std::endl;  //Remove the first 6 characters: C++ in library
    str_ref.remove_suffix(8);
    std::cout << "Remove the last 8 characters:" << str_ref << std::endl;//The last 8 characters removed: C++ in
}

Topics: boost