Memory layout of C + + objects

Posted by phpbeginner on Tue, 08 Feb 2022 02:01:34 +0100

Reprinted from CoolShell: https://coolshell.cn/articles/12176.html

Note:

1. The original text includes the experimental principle, experimental code and conclusion; This paper only contains the experimental principle and conclusion. The purpose is to simplify the experimental code.

2. This paper has made significant changes and deletions to the structure and text of the original text.

3. The original text was written earlier, and the 32-bit machine was used to do the experiment; There are no changes in this article. This has no effect on the conclusion; Maybe pay attention to some details.

 

Influencing factors of object

The main effects on memory layout are: member variables, virtual functions and virtual function coverage

Common inheritance types:

1. Single inheritance

2. Single virtual inheritance

3. Multiple inheritance (inheriting multiple classes)

4. Repeated multiple inheritance (multiple inherited parent classes have the same grandfather class)

5. Diamond virtual multiple inheritance

 

Experimental principle

How to get the address of the virtual function table through the address of an object?

typedef void(*Fun)(void);

Base b;

Fun pFun = NULL;

// Note: 32-bit machine
cout << "vtable address: " << (int*)(&b) << endl;
cout << "Address of the first virtual function: " << (int*)*(int*)(&b) << endl;

// Invoke the first virtual function
pFun = (Fun)*((int*)*(int*)(&b));
pFun();

Note:

If it is a 64 bit machine, change the int above to long.

The virtual address of the memory table (& B *) is the most difficult to understand, because the virtual address of the memory table (& B *) is not placed in front of the virtual address.

The address of the first virtual function is the number stored in the virtual table address. Therefore, after de referencing the virtual table address (i.e. adding an asterisk to the left), it is the address of the first virtual function; Add another (int *) on the far left to indicate that it is a pointer.

Then force this pointer into a function pointer, and you can call it.

It can also be seen that the first virtual function defined in Base is a function with no parameters and no return value.

Experimental environment:

1. Windows XP + VC++2003

2. Cygwin + G++3.4.4

 

single inheritance

 

class Parent {
public:
    int iparent;
    Parent ():iparent (10) {}
    virtual void f() { cout << " Parent::f()" << endl; }
    virtual void g() { cout << " Parent::g()" << endl; }
    virtual void h() { cout << " Parent::h()" << endl; }

};

class Child : public Parent {
public:
    int ichild;
    Child():ichild(100) {}
    virtual void f() { cout << "Child::f()" << endl; }
    virtual void g_child() { cout << "Child::g_child()" << endl; }
    virtual void h_child() { cout << "Child::h_child()" << endl; }
};

class GrandChild : public Child{
public:
    int igrandchild;
    GrandChild():igrandchild(1000) {}
    virtual void f() { cout << "GrandChild::f()" << endl; }
    virtual void g_child() { cout << "GrandChild::g_child()" << endl; }
    virtual void h_grandchild() { cout << "GrandChild::h_grandchild()" << endl; }
};

Memory layout:

Note:

Child::h1() in the above figure should be a clerical error of the original author. It should be: Child::h_child() 

Summary:

1. The front position of the virtual function table (Note: that is, the position with the relative address of 0)

2. Member variables are placed later according to their inheritance and declaration order

3. In a single inheritance, the overwritten virtual function is updated in the virtual function table

 

multiple inheritance

 

class Base1 {
public:
    int ibase1;
    Base1():ibase1(10) {}
    virtual void f() { cout << "Base1::f()" << endl; }
    virtual void g() { cout << "Base1::g()" << endl; }
    virtual void h() { cout << "Base1::h()" << endl; }

};

class Base2 {
public:
    int ibase2;
    Base2():ibase2(20) {}
    virtual void f() { cout << "Base2::f()" << endl; }
    virtual void g() { cout << "Base2::g()" << endl; }
    virtual void h() { cout << "Base2::h()" << endl; }
};

class Base3 {
public:
    int ibase3;
    Base3():ibase3(30) {}
    virtual void f() { cout << "Base3::f()" << endl; }
    virtual void g() { cout << "Base3::g()" << endl; }
    virtual void h() { cout << "Base3::h()" << endl; }
};

class Derive : public Base1, public Base2, public Base3 {
public:
    int iderive;
    Derive():iderive(100) {}
    virtual void f() { cout << "Derive::f()" << endl; }
    virtual void g1() { cout << "Derive::g1()" << endl; }
};

The memory layout is as follows:

Note:

NULL in the virtual tables of Base1 and Base2 in the above figure is the implementation of vc + +, while g++3.4.4 corresponds to 1, indicating that there is a virtual table behind it; At the end of the virtual table of Base3, both implementations are NULL  

Summary:

1. In the memory layout of subclasses, each parent class of multiple inheritance has its own virtual table;

2. The member function of the subclass is placed in the virtual table of the first parent class;

3. In the memory layout, its parent class layout is arranged in the order of declaration;

4. The f() function in the virtual table of each parent class is overwritten into the f() function of the child class. This is to solve the problem that pointers of different parent types point to the same subclass instance and can call the actual function.

 

Repeated inheritance

 

class B
{
    public:
        int ib;
        char cb;
    public:
        B():ib(0),cb('B') {}

        virtual void f() { cout << "B::f()" << endl;}
        virtual void Bf() { cout << "B::Bf()" << endl;}
};
class B1 :  public B
{
    public:
        int ib1;
        char cb1;
    public:
        B1():ib1(11),cb1('1') {}

        virtual void f() { cout << "B1::f()" << endl;}
        virtual void f1() { cout << "B1::f1()" << endl;}
        virtual void Bf1() { cout << "B1::Bf1()" << endl;}

};
class B2:  public B
{
    public:
        int ib2;
        char cb2;
    public:
        B2():ib2(12),cb2('2') {}

        virtual void f() { cout << "B2::f()" << endl;}
        virtual void f2() { cout << "B2::f2()" << endl;}
        virtual void Bf2() { cout << "B2::Bf2()" << endl;}

};

class D : public B1, public B2
{
    public:
        int id;
        char cd;
    public:
        D():id(100),cd('D') {}

        virtual void f() { cout << "D::f()" << endl;}
        virtual void f1() { cout << "D::f1()" << endl;}
        virtual void f2() { cout << "D::f2()" << endl;}
        virtual void Df() { cout << "D::Df()" << endl;}

};

The memory layout is as follows:

We can see that the member variables of the topmost parent class B exist in B1 and B2 and are inherited by D. In D, there are instances of B1 and B2, so there are two members of B in the instance of D, one inherited from B1 and the other inherited from B2. Therefore, if we use the following statement, a ambiguous compilation error will be generated:

D d;
d.ib = 0; // Ambiguous error
d.B1::ib = 1; // correct
d.B2::ib = 2; // correct

Note that the last two statements in the above routine access two variables. Although we have eliminated the ambiguous compilation error, there are still two instances of class B in D. this inheritance causes data duplication. We call this inheritance duplicate inheritance. Duplicate base class data members may not be what we want. Therefore, C + + introduces the concept of virtual base class.

 

Diamond type multiple virtual inheritance

The above "repeated inheritance" only needs to add the virtual key to the syntax of B1 and B2 inheritance B, which becomes virtual inheritance

The internal data and interfaces of the classes in the above figure and the previous "repeated inheritance" are exactly the same, but we adopt virtual inheritance: the omitted source code is as follows:

class B {......};
class B1 : virtual public B{......};
class B2: virtual public B{......};
class D : public B1, public B2{ ...... };

The memory layout is as follows:;

In the above output results, some marks are made in different colors. You can see the following points:

1. Whether GCC or VC + +, except for some differences in details, the object layout is basically the same. In other words, first B1 (yellow), then B2 (green), then D (gray), and the instances of super class B (cyan) are placed in the last position.

Note: in common multiple inheritance, the number of virtual tables is equivalent to the number of inherited parent classes; In this diamond virtual inheritance, there will be another virtual table of public grandfather class.

2. Regarding the virtual function table, especially the first virtual table, GCC is very different from VC + +. But look carefully, the virtual table of VC + + is clear and logical.

3. Both VC + + and GCC put the superclass B last, while VC + + has a NULL separator to separate the layout of B from B1 and B2. GCC did not.

4. In the memory layout of GCC, there is no pointer to B in B1 and B2. It is understandable that the compiler can calculate the size of B1 and B2 to obtain the offset of B.

 

Note:

For the time being, publish the simplified article first. Subsequent updates may be made for 64 bit machines and newer versions of g + +.

(end)

Topics: C++