On virtual function inheritance thunk technology of "charm of modern C + + design"

Posted by bPHP on Sat, 29 Jan 2022 14:49:08 +0100

Introduction: when debugging this C + + multi inheritance program with LLDB debugger at work, it is found that the pointer address obtained through LLDB print (alias of expression command) command is different from the address of C + + memory model actually understood. So what is the reason?

Author Yang Fu
Source: Ali technical official account

I. problem background

1. Practical verification

When debugging this C + + multi inheritance program with LLDB debugger at work, it is found that the pointer address obtained by LLDB print (alias of expression command) command is different from the address of C + + memory model actually understood. So what is the reason? The procedure is as follows:

class Base {
public:
    Base(){}
protected:
    float x;
};
class VBase {
public:
    VBase(){}
    virtual void test(){};
    virtual void foo(){};
protected:
    float x;
};
class VBaseA: public VBase {
public:
    VBaseA(){}
    virtual void test(){}
    virtual void foo(){};
protected:
    float x;
};
class VBaseB: public VBase  {
public:
    VBaseB(){}
    virtual void test(){
        printf("test \n");
    }
    virtual void foo(){};
protected:
    float x;
};
class VDerived : public VBaseA, public Base, public VBaseB {
public:
    VDerived(){}
    virtual void test(){}
    virtual void foo(){};
protected:
    float x;
};
int  main(int argc, char *argv[])
{
    VDerived *pDerived = new VDerived(); //0x0000000103407f30
    Base  *pBase = (Base*)pDerived; //0x0000000103407f40
    VBaseA *pvBaseA = static_cast< VBaseA*>(pDerived);//0x0000000103407f30
    VBaseB  *pvBaseB = static_cast< VBaseB*>(pDerived);//0x0000000103407f30 should be 0x0000000103407f48 here, but 0x0000000103407f30 is displayed
    unsigned long pBaseAddressbase = (unsigned long)pBase;
    unsigned long pvBaseAAddressbase = (unsigned long)pvBaseA;
    unsigned long pvBaseBAddressbase = (unsigned long)pvBaseB;
    pvBaseB->test();
}

The address obtained by lldb print command is shown in the following figure:

Normal understanding of C + + memory model

Because I use x86_64 mac system, so the pointer is 8-byte aligned, align=8.

According to the normal understanding of the C + + memory model: pDerived is converted to Base type pBase, and the address offset is 16. It is no problem.

pDerived is converted to VBaseA. Since the shared first address is 0x0000000103407f30, it is understandable. pDerived is converted to Base, and the address is offset by 16 bytes (sizeof(VBaseA)) to 0x0000000103407f40, which is also expected.

However, when pDerived is converted to VBase type, the pBaseB memory address should be offset by 24, 0x0000000103407f48; Instead of 0x0000000103407f30 (the first address of the object), what is the reason for this?

2. Guess caused by verification

For the above code

There is no virtual function in the Base class, and there are virtual functions test and foo in VBaseB. The guess is as follows

1. For the pointer of the base class without virtual function (without virtual table), the compiler will offset the address according to the actual offset during type conversion.

2. For the base class pointer containing virtual function (containing virtual table), during type conversion, the compiler actually does not offset the address, or points to the derived class, and does not point to the actual VBaseB type.

II. Problems caused by the phenomenon

1. For the base class pointer with virtual function (including virtual table), when the derived class type is converted to the base class with virtual function, is there a real address offset behind the compiler?

2. If offset is made

  • How does the compiler ensure that the value of this pointer is the same when calling the virtual function rewritten by the derived class through the base class pointer and the virtual function through the derived class pointer in C + + to ensure the correctness of the call?
  • Then why is the address obtained by LLDB expression the first address of the derived class object?

3. If there is no offset, how to call the base class member variables and functions through the pointer of the derived class?

Three core causes of the phenomenon

  • Like ordinary non virtual function inheritance, the compiler also makes pointer offset.
  • The pointer offset is made. When the pointer of the base class object in C + + calls the derived class object, the compiler uses the thunk technology to adjust each parameter call and the parameter return this address.
  • LLDB expression displays the first address of the derived class object (0x0000000103407f30), rather than the first address of the offset base class object (0x0000000103407f48), because when LLDB debugger displays the expression to the user, LLDB will format the result to be obtained through summary format for the base class pointer inherited by the virtual function. During summary format, the dynamic type and address of C + + runtime will be obtained according to the current memory address to show to the user.

IV. process of confirming the conclusion

1 does the compiler offset when converting pointer types?

Assembly instruction analysis

Is there a real address offset behind the compiler when the derived class type is converted to the base class with virtual function (including virtual table)?

Based on the above guess, verify the above guess through the following runtime disassembly program:

Before starting the disassembly program, there are some popularization of assembly knowledge to be used below. If familiar, skip can be ignored.

Note: since Xiaobian uses mac operating system, the processor uses at & T syntax; The syntax is different from Intel.

Instructions in at & T syntax are from left to right. The first is the source operand and the second is the destination operand. For example:

movl %esp, %ebp  //movl is the instruction name.% n It indicates that esp and ebp are registers In at & T syntax, the first is the source operand and the second is the destination operand.

Intel instructions are from right to left, the second is the source operand, and the first is the destination operand

MOVQ EBP, ESP //In the intel manual, you will see that there is no% intel syntax, and its operand order is just the opposite

In x86_64 register calling convention

1. The first parameter is basically placed in RDI/edi register, the second parameter is RSI/esi register, the third parameter is RDX register, the fourth parameter is RCD register, the fifth parameter is R8 register, and the sixth parameter is R9 register;

2. If there are more than six parameters in the function, additional parameters will be accessed through the stack;

3. The return value of the function is generally placed in eax register or rax register.

For the mac Unix operating system used below, the assembly instructions used in this paper are in at & T syntax, and the first parameter when the function passes parameters is placed in the RDI register.

The following are all assemblers of the above main program from the beginning of execution to the exit program

From the above assembly code, we find that when the compiler performs type conversion, whether the inherited base class has virtual functions or no virtual functions, the compiler will do the actual pointer offset to the address of the actual base class object, proving that the above guess is wrong. The compiler does not distinguish whether there is a virtual function or not during type conversion, but actually offsets it.

Memory analysis

The above guess was later verified through the memory read ptr (memory read command abbreviation x) provided by the LLDB debugger

(lldb) memory read pDerived
0x103407f30: 40 40 00 00 01 00 00 00 00 00 00 00 00 00 00 00  @@..............
0x103407f40: 10 00 00 00 00 00 00 00 60 40 00 00 01 00 00 00  ........`@......
(lldb) memory read pvBaseB
0x103407f48: 60 40 00 00 01 00 00 00 00 00 00 00 00 00 00 00  `@..............
0x103407f58: de 2d 05 10 00 00 00 00 00 00 00 00 00 00 00 00  .-..............

We found that the contents read by different types of pointers in memory are pDerived:0x103407f30 pvBaseB:0x103407f48, and the memory addresses are different; Are the actual offset addresses.

2 how do virtual function calls ensure that the value of this is consistent?

Since the real address in the content is offset and the derived class rewrites the virtual function of the base class, how does the compiler ensure that the values of the two calls to this pointer are the same to ensure the correctness of the call when calling the new virtual function of the derived class through the base class pointer and calling the virtual function implemented by itself through the derived class?

According to the information on the Internet, when C + + calls the function, the compiler adjusts the content of this pointer through thunk technology to point to the correct memory address. So what is thunk technology? How is the compiler implemented?

Analysis of assembly instruction of virtual function call

The disassembly of pvbaseb - > test() is easy to find through the main function above:

  pBaseB->test();
    0x100003c84 < +244>: movq   -0x40(%rbp), %rax    //-x40 stores the contents of the pBaseB pointer. Here, take out the address pointed to by pBaseB
    0x100003c88 < +248>: movq   (%rax), %rcx         //Then assign the contents of rax to rcx
    0x100003c8b < +251>: movq   %rax, %rdi           // Then give the value of rax to the rdi register: as we all know, the rdi register is the first parameter of the function call, and this here is the address of the base class
->  0x100003c8e < +254>: callq  *(%rcx)              // Take out the address of rcx here, and then indirectly call the address stored in rcx through * (rcx)

Let's skip to the assembly implementation of VDerived::test function. Here, use the command of lldb: register read rdi to view the first parameter of the function, that is, the address of this, which is already the address of the derived class, not the address of the base class before calling

testCPPVirtualMemeory`VDerived::test:
    0x100003e00 < +0>:  pushq  %rbp       //   Stack low pointer stack   
    0x100003e01 < +1>:  movq   %rsp, %rbp //  Point the BP pointer to SP, because the top pointer of the upper level function is the bottom pointer of the lower level function
    0x100003e04 < +4>:  subq   $0x10, %rsp  // Start function stack frame space
    0x100003e08 < +8>:  movq   %rdi, -0x8(%rbp)      //  Put the first parameter of the function on the stack, that is, the this pointer
->  0x100003e0c < +12>: leaq   0x15c(%rip), %rdi         ; "test\n"  
    0x100003e13 < +19>: movb   $0x0, %al
    0x100003e15 < +21>: callq  0x100003efc               ; symbol stub for: printf
    0x100003e1a < +26>: addq   $0x10, %rsp //Reclaim stack space
    0x100003e1e < +30>: popq   %rbp        //Out of stack refers to the rbp of the previous layer
    0x100003e1f < +31>: retq               //Point to the next command

Through the above assembly, we analyze that when the compiler calls the functions in the virtual function table, it uses * (% rcx) indirect addressing, and then does an operation in the middle to jump to the implementation of test. What does thunk do in this process?

Llvm thunk source code analysis

The IDE used by Xiaobian uses LLVM compiler, so I found the answer by looking at the source code of LLVM: in vtablebuilder CPP AddMethods function, Xiaobian found the answer, which is described as follows:

  // Now go through all virtual member functions and add them to the current
  // vftable. This is done by
  //  - replacing overridden methods in their existing slots, as long as they
  //    don't require return adjustment; calculating This adjustment if needed.
  //  - adding new slots for methods of the current base not present in any
  //    sub-bases;
  //  - adding new slots for methods that require Return adjustment.
  // We keep track of the methods visited in the sub-bases in MethodInfoMap.

When compiling, the compiler will judge whether the virtual function derived class of the base class is overwritten. If it is implemented, the address in the virtual function table will be dynamically replaced as the address of the derived class. At the same time:

1. It will calculate whether the address of this pointer needs to be adjusted when calling. If it needs to be adjusted, it will open up a new memory space for the current method;

2. It will also open up a new memory space for functions that need the return value of this;

The code is as follows:

void VFTableBuilder::AddMethods(BaseSubobject Base, unsigned BaseDepth,
                                const CXXRecordDecl *LastVBase,
                                BasesSetVectorTy &VisitedBases) {
  const CXXRecordDecl *RD = Base.getBase();
  if (!RD->isPolymorphic())
    return;

  const ASTRecordLayout &Layout = Context.getASTRecordLayout(RD);

  // See if this class expands a vftable of the base we look at, which is either
  // the one defined by the vfptr base path or the primary base of the current
  // class.
  const CXXRecordDecl *NextBase = nullptr, *NextLastVBase = LastVBase;
  CharUnits NextBaseOffset;
  if (BaseDepth < WhichVFPtr.PathToIntroducingObject.size()) {
    NextBase = WhichVFPtr.PathToIntroducingObject[BaseDepth];
    if (isDirectVBase(NextBase, RD)) {
      NextLastVBase = NextBase;
      NextBaseOffset = MostDerivedClassLayout.getVBaseClassOffset(NextBase);
    } else {
      NextBaseOffset =
          Base.getBaseOffset() + Layout.getBaseClassOffset(NextBase);
    }
  } else if (const CXXRecordDecl *PrimaryBase = Layout.getPrimaryBase()) {
    assert(!Layout.isPrimaryBaseVirtual() &&
           "No primary virtual bases in this ABI");
    NextBase = PrimaryBase;
    NextBaseOffset = Base.getBaseOffset();
  }

  if (NextBase) {
    AddMethods(BaseSubobject(NextBase, NextBaseOffset), BaseDepth + 1,
               NextLastVBase, VisitedBases);
    if (!VisitedBases.insert(NextBase))
      llvm_unreachable("Found a duplicate primary base!");
  }

  SmallVector< const CXXMethodDecl*, 10> VirtualMethods;
  // Put virtual methods in the proper order.
  GroupNewVirtualOverloads(RD, VirtualMethods);

  // Now go through all virtual member functions and add them to the current
  // vftable. This is done by
  //  - replacing overridden methods in their existing slots, as long as they
  //    don't require return adjustment; calculating This adjustment if needed.
  //  - adding new slots for methods of the current base not present in any
  //    sub-bases;
  //  - adding new slots for methods that require Return adjustment.
  // We keep track of the methods visited in the sub-bases in MethodInfoMap.
  for (const CXXMethodDecl *MD : VirtualMethods) {
    FinalOverriders::OverriderInfo FinalOverrider =
        Overriders.getOverrider(MD, Base.getBaseOffset());
    const CXXMethodDecl *FinalOverriderMD = FinalOverrider.Method;
    const CXXMethodDecl *OverriddenMD =
        FindNearestOverriddenMethod(MD, VisitedBases);

    ThisAdjustment ThisAdjustmentOffset;
    bool ReturnAdjustingThunk = false, ForceReturnAdjustmentMangling = false;
    CharUnits ThisOffset = ComputeThisOffset(FinalOverrider);
    ThisAdjustmentOffset.NonVirtual =
        (ThisOffset - WhichVFPtr.FullOffsetInMDC).getQuantity();
    if ((OverriddenMD || FinalOverriderMD != MD) &&
        WhichVFPtr.getVBaseWithVPtr())
      CalculateVtordispAdjustment(FinalOverrider, ThisOffset,
                                  ThisAdjustmentOffset);

    unsigned VBIndex =
        LastVBase ? VTables.getVBTableIndex(MostDerivedClass, LastVBase) : 0;

    if (OverriddenMD) {
      // If MD overrides anything in this vftable, we need to update the
      // entries.
      MethodInfoMapTy::iterator OverriddenMDIterator =
          MethodInfoMap.find(OverriddenMD);

      // If the overridden method went to a different vftable, skip it.
      if (OverriddenMDIterator == MethodInfoMap.end())
        continue;

      MethodInfo &OverriddenMethodInfo = OverriddenMDIterator->second;

      VBIndex = OverriddenMethodInfo.VBTableIndex;

      // Let's check if the overrider requires any return adjustments.
      // We must create a new slot if the MD's return type is not trivially
      // convertible to the OverriddenMD's one.
      // Once a chain of method overrides adds a return adjusting vftable slot,
      // all subsequent overrides will also use an extra method slot.
      ReturnAdjustingThunk = !ComputeReturnAdjustmentBaseOffset(
                                  Context, MD, OverriddenMD).isEmpty() ||
                             OverriddenMethodInfo.UsesExtraSlot;

      if (!ReturnAdjustingThunk) {
        // No return adjustment needed - just replace the overridden method info
        // with the current info.
        MethodInfo MI(VBIndex, OverriddenMethodInfo.VFTableIndex);
        MethodInfoMap.erase(OverriddenMDIterator);

        assert(!MethodInfoMap.count(MD) &&
               "Should not have method info for this method yet!");
        MethodInfoMap.insert(std::make_pair(MD, MI));
        continue;
      }

      // In case we need a return adjustment, we'll add a new slot for
      // the overrider. Mark the overridden method as shadowed by the new slot.
      OverriddenMethodInfo.Shadowed = true;

      // Force a special name mangling for a return-adjusting thunk
      // unless the method is the final overrider without this adjustment.
      ForceReturnAdjustmentMangling =
          !(MD == FinalOverriderMD && ThisAdjustmentOffset.isEmpty());
    } else if (Base.getBaseOffset() != WhichVFPtr.FullOffsetInMDC ||
               MD->size_overridden_methods()) {
      // Skip methods that don't belong to the vftable of the current class,
      // e.g. each method that wasn't seen in any of the visited sub-bases
      // but overrides multiple methods of other sub-bases.
      continue;
    }

    // If we got here, MD is a method not seen in any of the sub-bases or
    // it requires return adjustment. Insert the method info for this method.
    MethodInfo MI(VBIndex,
                  HasRTTIComponent ? Components.size() - 1 : Components.size(),
                  ReturnAdjustingThunk);

    assert(!MethodInfoMap.count(MD) &&
           "Should not have method info for this method yet!");
    MethodInfoMap.insert(std::make_pair(MD, MI));

    // Check if this overrider needs a return adjustment.
    // We don't want to do this for pure virtual member functions.
    BaseOffset ReturnAdjustmentOffset;
    ReturnAdjustment ReturnAdjustment;
    if (!FinalOverriderMD->isPure()) {
      ReturnAdjustmentOffset =
          ComputeReturnAdjustmentBaseOffset(Context, FinalOverriderMD, MD);
    }
    if (!ReturnAdjustmentOffset.isEmpty()) {
      ForceReturnAdjustmentMangling = true;
      ReturnAdjustment.NonVirtual =
          ReturnAdjustmentOffset.NonVirtualOffset.getQuantity();
      if (ReturnAdjustmentOffset.VirtualBase) {
        const ASTRecordLayout &DerivedLayout =
            Context.getASTRecordLayout(ReturnAdjustmentOffset.DerivedClass);
        ReturnAdjustment.Virtual.Microsoft.VBPtrOffset =
            DerivedLayout.getVBPtrOffset().getQuantity();
        ReturnAdjustment.Virtual.Microsoft.VBIndex =
            VTables.getVBTableIndex(ReturnAdjustmentOffset.DerivedClass,
                                    ReturnAdjustmentOffset.VirtualBase);
      }
    }

    AddMethod(FinalOverriderMD,
              ThunkInfo(ThisAdjustmentOffset, ReturnAdjustment,
                        ForceReturnAdjustmentMangling ? MD : nullptr));
  }
}

Through the above code analysis, when this needs to be adjusted, a structure of ThunkInfo is added through the AddMethod(FinalOverriderMD, ThunkInfo (this adjustmentoffset, returnadjustment, forcereturnadjustmentmangling? MD: nullptr)) function. The structure of ThunkInfo (implemented in ABI.h) is as follows:

struct ThunkInfo {
  /// The \c this pointer adjustment.
  ThisAdjustment This;

  /// The return adjustment.
  ReturnAdjustment Return;

  /// Holds a pointer to the overridden method this thunk is for,
  /// if needed by the ABI to distinguish different thunks with equal
  /// adjustments. Otherwise, null.
  /// CAUTION: In the unlikely event you need to sort ThunkInfos, consider using
  /// an ABI-specific comparator.
  const CXXMethodDecl *Method;

  ThunkInfo() : Method(nullptr) { }

  ThunkInfo(const ThisAdjustment &This, const ReturnAdjustment &Return,
            const CXXMethodDecl *Method = nullptr)
      : This(This), Return(Return), Method(Method) {}

  friend bool operator==(const ThunkInfo &LHS, const ThunkInfo &RHS) {
    return LHS.This == RHS.This && LHS.Return == RHS.Return &&
           LHS.Method == RHS.Method;
  }

  bool isEmpty() const {
    return This.isEmpty() && Return.isEmpty() && Method == nullptr;
  }
};

}

The structure of Thunkinfo has a method to store the real implementation of the function. This and Return record the information that needs to be adjusted by this. Then, when generating the method, the compiler automatically inserts the information of thunk function according to this information. Through the function of Itanium manglecontextimpl:: manglethunk (const cxxmethoddecl * MD, const Thunkinfo & thunk, raw_ostream & out), We have confirmed that the function is as follows:

(mangle and demand: the process of converting C + + source identifier into C++ ABI identifier is called mangle; The opposite process is called demangle. wiki)

void ItaniumMangleContextImpl::mangleThunk(const CXXMethodDecl *MD,
                                           const ThunkInfo &Thunk,
                                           raw_ostream &Out) {
  //  < special-name> ::= T < call-offset> < base encoding>
  //                      # base is the nominal target function of thunk
  //  < special-name> ::= Tc < call-offset> < call-offset> < base encoding>
  //                      # base is the nominal target function of thunk
  //                      # first call-offset is 'this' adjustment
  //                      # second call-offset is result adjustment

  assert(!isa< CXXDestructorDecl>(MD) &&
         "Use mangleCXXDtor for destructor decls!");
  CXXNameMangler Mangler(*this, Out);
  Mangler.getStream() << "_ZT";
  if (!Thunk.Return.isEmpty())
    Mangler.getStream() << 'c';

  // Mangle the 'this' pointer adjustment.
  Mangler.mangleCallOffset(Thunk.This.NonVirtual,
                           Thunk.This.Virtual.Itanium.VCallOffsetOffset);

  // Mangle the return pointer adjustment if there is one.
  if (!Thunk.Return.isEmpty())
    Mangler.mangleCallOffset(Thunk.Return.NonVirtual,
                             Thunk.Return.Virtual.Itanium.VBaseOffsetOffset);

  Mangler.mangleFunctionEncoding(MD);
}

thunk assembly instruction analysis

So far, through the LLVM source code, we have solved the true face of thunk technology. Then we verify it through the disassembly program. Here, we can use objdump or reverse sharp tool hopper. The Xiaobian uses hopper. The assembly code is as follows:

1. Let's first look at the thunk version test function implemented by the compiler

test function implemented by derived class

thunk version of the test function implemented by the compiler

2. Through the above two screenshots, we find that

The test function address of thunk implemented by the compiler is 0x100003e30

The address of the test function implemented by the derived class is 0x100003e00

Now let's see which real address is stored in the virtual table of the derived class

From the above figure, we can see that the real address stored in the virtual table of the derived class is the address 0x100003e30 of the thunk function dynamically added by the compiler.

* (rcx) indirect addressing analyzed above: it is to call the implementation of thunk function, and then call the function covered by the real derived class in thunk.

Here we can determine the thunk Technology:

When compiling, the compiler dynamically adds the corresponding thunk version function when it needs to adjust the call this and the return value this, realizes the offset adjustment of this in the thunk function, and calls the virtual function implemented by the derived class; The address of the thunk function implemented by the compiler is stored in the virtual table instead of the address of the virtual function implemented by the derived class.

Memory layout of thunk function

You can also determine the corresponding memory layout as follows:

Therefore (not the first in the inheritance chain) the calling order of the base class pointer inherited by the virtual function is:

Virtual thunk and non virtual thunk

Note: it can be seen here that there are two vbases in memory, which are divided into ordinary inheritance, virtual function inheritance and virtual inheritance. Virtual inheritance is mainly to solve the above problem: there are two copies of Vbase memory in memory at the same time. Changing the above code will ensure that there is only one instance in memory:

class VBaseA: public VBase changed to class VBaseA: public virtual VBase

class VBaseB: public VBase changed to class VBaseB: public virtual VBase

In this way, VBase in memory has only one minute of memory.

There are still unanswered questions here, that is, the thunk function type in the screenshot above is:

We find that the thunk function is of non virtual thunk type. What is the corresponding virtual thunk type?

Before answering this question, let's look at the following example?

public A {
    virtual void test() {
    }
}
public B {
    virtual void test1() {
    }
}
public C {
    virtual void test2() {
    }
}
public D : public virtual A, public virtual B, public C {
     virtual void test1() { // The test1 function implemented here is the type of virtual trunk in the virtual function table of class B
     }
     virtual void test2() { // The virtual function representation of the test2 function implemented here in class C is the type of no virtual trunk
     }
}

When virtual function inheritance and virtual inheritance are combined, and the class is not the first base class in the inheritance chain of the derived class, the virtual function implemented by the derived class is stored in the virtual table when the compiler compiles.

Only when the virtual function inherits, and the class is not the first base class in the inheritance chain of the derived class, the virtual function implemented by the derived class is stored in the virtual table as the no virtual trunk type when the compiler compiles.

3 why does the LLDB debugger display the same address?

If the offset is made, why is the address displayed by LLDB expression the first address of the derived class object?

Now that we know what thunk technology is, there is no problem that has not been solved: during LLDB debugging, the address of this displayed is the address after the offset of the base class (the address of the derived class). Previously, through assembly analysis, the compiler made a real offset during type conversion, and it was also found to be the real address after the offset by reading the memory address, Why is the address obtained by lldb expression the address of a derived class? It can be speculated that the LLDB debugger made type conversion when executing through the express command.

Through reading the source code of LLDB debugger and LLDB description documents, it is known from the documents that when LLDB gets an address and needs to be displayed to users in a friendly way, it first needs to format and convert it through summary format(), which is based on the acquisition of dynamic type (LLDB getdynamictypeandaddress), The answer is found in the bool itaniumabilanguagerruntime:: getdynamictypeandaddress (LLDB summary format) function of LLDB source code. The code is as follows

 // For Itanium, if the type has a vtable pointer in the object, it will be at
  // offset 0
  // in the object.  That will point to the "address point" within the vtable
  // (not the beginning of the
  // vtable.)  We can then look up the symbol containing this "address point"
  // and that symbol's name
  // demangled will contain the full class name.
  // The second pointer above the "address point" is the "offset_to_top".  We'll
  // use that to get the
  // start of the value object which holds the dynamic type.
bool ItaniumABILanguageRuntime::GetDynamicTypeAndAddress(
    ValueObject &in_value, lldb::DynamicValueType use_dynamic,
    TypeAndOrName &class_type_or_name, Address &dynamic_address,
    Value::ValueType &value_type) {
  // For Itanium, if the type has a vtable pointer in the object, it will be at
  // offset 0
  // in the object.  That will point to the "address point" within the vtable
  // (not the beginning of the
  // vtable.)  We can then look up the symbol containing this "address point"
  // and that symbol's name
  // demangled will contain the full class name.
  // The second pointer above the "address point" is the "offset_to_top".  We'll
  // use that to get the
  // start of the value object which holds the dynamic type.
  //

  class_type_or_name.Clear();
  value_type = Value::ValueType::eValueTypeScalar;

  // Only a pointer or reference type can have a different dynamic and static
  // type:
  if (CouldHaveDynamicValue(in_value)) {
    // First job, pull out the address at 0 offset from the object.
    AddressType address_type;
    lldb::addr_t original_ptr = in_value.GetPointerValue(&address_type);
    if (original_ptr == LLDB_INVALID_ADDRESS)
      return false;

    ExecutionContext exe_ctx(in_value.GetExecutionContextRef());

    Process *process = exe_ctx.GetProcessPtr();

    if (process == nullptr)
      return false;

    Status error;
    const lldb::addr_t vtable_address_point =
        process->ReadPointerFromMemory(original_ptr, error);

    if (!error.Success() || vtable_address_point == LLDB_INVALID_ADDRESS) {
      return false;
    }

    class_type_or_name = GetTypeInfoFromVTableAddress(in_value, original_ptr,
                                                      vtable_address_point);

    if (class_type_or_name) {
      TypeSP type_sp = class_type_or_name.GetTypeSP();
      // There can only be one type with a given name,
      // so we've just found duplicate definitions, and this
      // one will do as well as any other.
      // We don't consider something to have a dynamic type if
      // it is the same as the static type.  So compare against
      // the value we were handed.
      if (type_sp) {
        if (ClangASTContext::AreTypesSame(in_value.GetCompilerType(),
                                          type_sp->GetForwardCompilerType())) {
          // The dynamic type we found was the same type,
          // so we don't have a dynamic type here...
          return false;
        }

        // The offset_to_top is two pointers above the vtable pointer.
        const uint32_t addr_byte_size = process->GetAddressByteSize();
        const lldb::addr_t offset_to_top_location =
            vtable_address_point - 2 * addr_byte_size;
        // Watch for underflow, offset_to_top_location should be less than
        // vtable_address_point
        if (offset_to_top_location >= vtable_address_point)
          return false;
        const int64_t offset_to_top = process->ReadSignedIntegerFromMemory(
            offset_to_top_location, addr_byte_size, INT64_MIN, error);

        if (offset_to_top == INT64_MIN)
          return false;
        // So the dynamic type is a value that starts at offset_to_top
        // above the original address.
        lldb::addr_t dynamic_addr = original_ptr + offset_to_top;
        if (!process->GetTarget().GetSectionLoadList().ResolveLoadAddress(
                dynamic_addr, dynamic_address)) {
          dynamic_address.SetRawAddress(dynamic_addr);
        }
        return true;
      }
    }
  }

  return class_type_or_name.IsEmpty() == false;
}

It can be seen from the above code analysis that every time the pointer address is dynamically called through the LLDB command expression, the LLDB will format according to the default format of the debugger. The premise of formatting is to dynamically obtain the corresponding type and offset address; When there is a virtual table in C + + and it is not the first base class pointer in the virtual table, the offset above the pointer will be used_ to_ Top obtains the corresponding dynamic type and returns the starting address of the dynamically obtained object of this type.

V. summary

  • The above mainly verifies that the real address offset is made inside the compiler during pointer type conversion;
  • Through the above analysis, we know that the compiler dynamically adjusts the input parameter this pointer and the return value this pointer through thunk technology during function call to ensure the correctness of this during C + + call;
  • When obtaining the pointer content of the base class of a non virtual function through LLDB expression, the LLDB internally performs formatting conversion through summary format, and dynamic type acquisition will be performed during formatting conversion.

Six tools

1 get assembler

Preprocessing - > assembly

clang++ -E main.cpp -o main.i
clang++ -S main.i

objdump

objdump -S -C executable

Disassembler: hopper

Download hopper and drag in the executable program

Xcode

Xcode->Debug->Debug WorkFlow->Show disassembly

2 Export C + + memory layout

Clang + + compiler

clang++ -cc1 -emit-llvm -fdump-record-layouts -fdump-vtable-layouts main.cpp

VII. References

https://matklad.github.io/201...
https://lldb.llvm.org/use/var...
https://github.com/llvm-mirro...
https://clang.llvm.org/doxyge...
https://clang.llvm.org/doxyge...

Related technologies:

llvm-virtual-thunk
llvm-no-virtual-thunk
lldb-summary-format
lldb-getdynamictypeandaddress

Original link
This article is the original content of Alibaba cloud and cannot be reproduced without permission.

Topics: C++ Back-end