Deep Understanding of String in C#

Posted by artic on Fri, 28 Jun 2019 05:03:05 +0200

On the Types in C

Types in C# are divided into value type and reference type. Reference type and value type are inherited from System.Object class. Almost all reference types are inherited directly from System.Object. Specifically, value type inherits a subclass of System.Object, that is, inherit System.ValueType. String type is a bit special, although it belongs to the reference type, but some of its characteristics are somewhat similar to value type.

On C# String

1. Invariance

Let's start with an example:

static void Main(string[] args)
{
    string str1 = "string";
    string str2 = str1;
    Console.WriteLine(object.ReferenceEquals(str1, str2));
    str2 += "change";
    Console.WriteLine(object.ReferenceEquals(str1, str2));
    Console.ReadKey();
}

The output is True and False. Why? Let's take a look at IL.

.entrypoint
  // Code size 48 (0x30)
  .maxstack  2
  .locals init ([0] string str1,
           [1] string str2)
  IL_0000:  nop
  IL_0001:  ldstr      "string"
  IL_0006:  stloc.0
  IL_0007:  ldloc.0
  IL_0008:  stloc.1
  IL_0009:  ldloc.0
  IL_000a:  ldloc.1
  IL_000b:  ceq
  IL_000d:  call       void [mscorlib]System.Console::WriteLine(bool)
  IL_0012:  nop
  IL_0013:  ldloc.1
  IL_0014:  ldstr      "change"
  IL_0019:  call       string [mscorlib]System.String::Concat(string,string) 
  IL_001e:  stloc.1
  IL_001f:  ldloc.0
  IL_0020:  ldloc.1
  IL_0021:  ceq
  IL_0023:  call       void [mscorlib]System.Console::WriteLine(bool)
  IL_0028:  nop
  IL_0029:  call       valuetype [mscorlib]System.ConsoleKeyInfo [mscorlib]System.Console::ReadKey()
  IL_002e:  pop
  IL_002f:  ret

+= The Concat function is called internally to connect str2 with "change" to generate a new string directly, which is different from the original string. Trim and Remove functions generate a new object directly. Once a string is defined, it cannot be changed.

In fact, strings are atomic (that is, invariant), and any behavior that changes the value of a string will not succeed. It will only create a new string object. In practical programming, we will use strings a lot, which will lead to the creation of new string objects and allocation of memory, which may lead to garbage collector GC constantly garbage collection, greatly reducing performance, and accompanied by the risk of memory overflow. So. Net has a special processing of strings, which is string resident pool.

In the string residence pool, references to literal values and pointers of strings are saved. Every time a new string is created, it will look for the existence of a string with the same literal value in the resident pool. If it exists, it will point to a reference to an existing string. If it does not exist, it will directly create a new string and then point to a new address.

2. Processing as Functional Parameters

In parameter transfer of a function, the value type directly copies the value saved by the variable, passing a copy of the value that is worth transferring, while the reference type transfers a copy of the address, so changing the value of the attribute in the reference parameter in a function will directly change the value of the real type object outside the function.

static void Main(string[] args)
{
    People people = new People() { Name = "Jack" };
    Console.WriteLine(people.Name);
    Change(people);
    Console.WriteLine(people.Name);
    Console.ReadKey();
}

static void Change(People p)
{
    p.Name = "Eason";
}

class People
{
    public string Name { get; set; }
}

The program outputs Jack first, and then Eason. It can be explained that the reference type passes the reference address, and the parameter object of the function change and the object passed in from outside is an object.

So let's look at String as a parameter:

static void Main(string[] args)
{
    string str = "string";
    Console.WriteLine(str);
    Change(str);
    Console.WriteLine(str);
    Console.ReadKey();
}

static void Change(string str)
{
    str = "change";
    Console.WriteLine(str);
}

The results output string, change, string. After calling the Change function, the value of str is still "string". Because of the invariance of string type, assigning str to the Change function creates a new string object, and then attaches a reference to the new object. So although the string type is a reference type, it is actually equivalent to the value type when the parameter is passed.

3. Equal Comparing Processing

Let's start with an example:

string str1 = "string";
string str2 = "string";
string str3 = "stringstring";
string str4 = "string" + "string";
string str5 = str1 + "string";
Console.WriteLine(ReferenceEquals(str1, str2));
Console.WriteLine(str1 == str2);
Console.WriteLine(ReferenceEquals(str3, str4));
Console.WriteLine(str3 == str4);
Console.WriteLine(ReferenceEquals(str3, str5));
Console.WriteLine(str3 == str5);
Console.ReadKey();

True, True, True, True, True, True, True, True, True, True, False, True, str3 and str5 are not objects, they do not refer to the same address, why? After looking at the IL code, we find that str5 calls the Concat function in the IL code to stitch str1 and "string". What exactly does the Concat function do?

public static string Concat(string str0, string str1)
{
    if (IsNullOrEmpty(str0))
    {
        if (IsNullOrEmpty(str1))
        {
            return Empty;
        }
        return str1;
    }
    if (IsNullOrEmpty(str1))
    {
        return str0;
    }
    int length = str0.Length;
    string dest = FastAllocateString(length + str1.Length);
    FillStringChecked(dest, 0, str0);
    FillStringChecked(dest, length, str1);
    return dest;
}

The FastAllocateString function allocates an empty string dest with the length str0.Length+str1.Length. FillStringChecked copies str0 and STR1 into dest respectively, and finally generates a string connected by str0 and str1, so that it does not go to the string residence pool to find out whether there is the same string as dest, but directly generates a new object. So when string variables and string constants are spliced together, a new object will be generated directly, bypassing the resident pool check.

Constant splicing of strings does not produce new strings unless there is no string in the resident pool that has the same literal value as the spliced string. Let's look at the IL code:

  IL_0001:  ldstr      "string"
  IL_0006:  stloc.0
  IL_0007:  ldstr      "string"
  IL_000c:  stloc.1
  IL_000d:  ldstr      "stringstring"
  IL_0012:  stloc.2
  IL_0013:  ldstr      "stringstring"
  IL_0018:  stloc.3
  IL_0019:  ldloc.0
  IL_001a:  ldstr      "string"
  IL_001f:  call       string [mscorlib]System.String::Concat(string,string)
  IL_0024:  stloc.s    str5
  IL_0026:  ldloc.0
  IL_0027:  ldloc.1

The literal values of str3 and str4 are equal, they are both "stringstring". str3 is initialized before str4. When str4 is initialized, because its literal value is equal to str3, the CLR assigns the address to str4, so the references of str3 and str4 are equal.

As for the "==" operator, the result is True because the "==" operator calls the String.Equal method. The IL code is as follows:

  IL_0032:  call       bool [mscorlib]System.String::op_Equality(string,string)

op_Equality eventually calls the String.Equal function. The comparison step of the Equal function is to first compare whether the references of two objects are equal, and then compare the values if they are not equal. When comparing the values, they are compared bit by bit.

Topics: C# Programming Attribute