Swift parses Struct like HandyJSON

Posted by garg_vivek on Wed, 12 Jan 2022 18:25:47 +0100

HandyJSON

HandyJSON is a framework developed by Ali to convert JSON data into corresponding models on swift. Compared with other popular Swift JSON libraries, HandyJSON is characterized by its support for pure swift classes and ease of use. When it is deserialized (converting JSON to Model), it does not require Model to inherit from NSObject (because it is not based on the KVC mechanism), nor does it require you to define a Mapping function for Model. As long as you define the Model class and declare that it obeys the HandyJSON protocol, HandyJSON can resolve values from the JSON string by itself with the property name Key for each property. However, since HandyJSON is based on swift metadata, if the structure of swift metadata is changed, HandyJSON may not be available directly. Of course, Ali has been maintaining this framework, swift's source code has changed, I believe the framework is also relative to the change.
github of HandyJSON

Resolve Struct from Source Code

Get TargetStructMetadata

Since HandyJSON is based on swift metadata, when it comes to parsing and parsing struct s, you have to understand metadata. Next, we'll look for metadata from a source perspective.
First, let's start with the source metadata. Searching for information about StructMetadata in H reveals that its true type is TargetStructMetadata.

using StructMetadata = TargetStructMetadata<InProcess>;

Next, when we look at the structure of TargetStructMetadata, we see that TargetStructMetadata inherits from TargetValueMetadata and TargetValueMetadata inherits from TargetMetadata.

struct TargetStructMetadata : public TargetValueMetadata<Runtime> {
struct TargetValueMetadata : public TargetMetadata<Runtime> {

This inheritance chain allows us to restore the structure of TargetStructMetadata.
As you can see from the code, the first property of TargetStructMetadata is Kind, and in addition to this property, there is a description that records the description file.

struct TargetMetadata {
	......
	private:
	  /// The kind. Only valid for non-class metadata; getKind() must be used to get
	  /// the kind value.
	  StoredPointer Kind;
	......
}

struct TargetValueMetadata : public TargetMetadata<Runtime> {
  using StoredPointer = typename Runtime::StoredPointer;
  TargetValueMetadata(MetadataKind Kind,
                      const TargetTypeContextDescriptor<Runtime> *description)
      : TargetMetadata<Runtime>(Kind), Description(description) {}
  //Description used to record metadata
  /// An out-of-line description of the type.
  TargetSignedPointer<Runtime, const TargetValueTypeDescriptor<Runtime> * __ptrauth_swift_type_descriptor> Description;
  ......
}

This gives us the structure of TargetStructMetadata as

struct TargetStructMetadata {
	// StoredPointer Kind; Using StoredPointer = uint64_under 64-bit system T; That is Int
    var kind: Int  
    //Define it as Unsafe MutablePointer for now, and then analyze that the structure T of the typeDescriptor is generic
    var typeDescriptor: UnsafeMutablePointer<T>
}

Get TargetStructDescriptor

Next, we'll parse the information about Description. The TargetStructDescriptor is probably the structure of the Description from the source code.

  const TargetStructDescriptor<Runtime> *getDescription() const {
    return llvm::cast<TargetStructDescriptor<Runtime>>(this->Description);
  }

We find TargetStructDescriptor, which is inherited from TargetValueTypeDescriptor and contains two attributes, NumFields (count of record attributes) and FieldOffsetVectorOffset (offset of record attributes in metadata).

class TargetStructDescriptor final
    : public TargetValueTypeDescriptor<Runtime>,
      public TrailingGenericContextObjects<TargetStructDescriptor<Runtime>,
                            TargetTypeGenericContextDescriptorHeader,
                            /*additional trailing objects*/
                            TargetForeignMetadataInitialization<Runtime>,
                            TargetSingletonMetadataInitialization<Runtime>,
                            TargetCanonicalSpecializedMetadatasListCount<Runtime>,
                            TargetCanonicalSpecializedMetadatasListEntry<Runtime>,
                            TargetCanonicalSpecializedMetadatasCachingOnceToken<Runtime>> {
	......
	  /// The number of stored properties in the struct.
  /// If there is a field offset vector, this is its length.
  uint32_t NumFields; //count of record attributes
  /// The offset of the field offset vector for this struct's stored
  /// properties in its metadata, if any. 0 means there is no field offset
  /// vector.
  uint32_t FieldOffsetVectorOffset; //Record attribute offset in metadata

TargetValueTypeDescriptor inherits from TargetTypeContextDescriptor, which contains three properties: Name (the name of the type), AccessFunctionPtr (a pointer to this type of metadata access function), and Fields (a pointer to a type's field descriptor).

class TargetValueTypeDescriptor
    : public TargetTypeContextDescriptor<Runtime> {
public:
  static bool classof(const TargetContextDescriptor<Runtime> *cd) {
    return cd->getKind() == ContextDescriptorKind::Struct ||
           cd->getKind() == ContextDescriptorKind::Enum;
  }
};
class TargetTypeContextDescriptor
    : public TargetContextDescriptor<Runtime> {
public:
  /// The name of the type.
  // Name of type
  TargetRelativeDirectPointer<Runtime, const char, /*nullable*/ false> Name;

  /// A pointer to the metadata access function for this type.
  ///
  /// The function type here is a stand-in. You should use getAccessFunction()
  /// to wrap the function pointer in an accessor that uses the proper calling
  /// convention for a given number of arguments.
  // Pointer to this type of metadata access function
  TargetRelativeDirectPointer<Runtime, MetadataResponse(...),
                              /*Nullable*/ true> AccessFunctionPtr;
  
  /// A pointer to the field descriptor for the type, if any.
  // A pointer to a field descriptor of a type
  TargetRelativeDirectPointer<Runtime, const reflection::FieldDescriptor,
                              /*nullable*/ true> Fields;
	......
}

TargetTypeContextDescriptor also inherits from the base class TargetContextDescriptor, which contains two attributes: Flags (flags for describing contexts, including kind and version) and Aret (contexts for Parent classes, or NULL if there is no Parent at the top).

/// Base class for all context descriptors.
template<typename Runtime>
struct TargetContextDescriptor {
  /// Flags describing the context, including its kind and format version.
  // A flag for describing the context, including kind and version
  ContextDescriptorFlags Flags;
  
  /// The parent context, or null if this is a top-level context.
  // The context used to represent the parent class, or NULL if it is at the top level, if there is no parent
  TargetRelativeContextPointer<Runtime> Parent;
  ......
}

From here on, TargetStructDescriptor is already clear, so we can write out the structure of TargetStructDescriptor and fix the generic T in TargetStructMetadata.

struct TargetStructMetadata {
    var kind: Int
    var typeDescriptor: UnsafeMutablePointer<TargetStructDescriptor>
}

struct TargetStructDescriptor {
	// A flag for describing the context, including kind and version
    var flags: Int32 // ContextDescriptorFlags Int32
    // The context used to represent the parent class, or NULL if it is at the top level, if there is no parent
    var parent: TargetRelativeContextPointer<UnsafeRawPointer> // Relative relative address
    // Name of type
    var name: TargetRelativeDirectPointer<CChar> // Relative relative address
    // Pointer to this type of metadata access function
    var accessFunctionPointer: TargetRelativeDirectPointer<UnsafeRawPointer> //  Relative relative address
    // A pointer to a field descriptor of a type
    var fieldDescriptor: TargetRelativeDirectPointer<FieldDescriptor> //  Relative relative address
    // count of record attributes
    var numFields: Int32
    // Record attribute offset in metadata
    var fieldOffsetVectorOffset: Int32
}

// Here are the type resolution of some attributes
/// Common flags stored in the first 32-bit word of any context descriptor.
// flags are Int32
struct ContextDescriptorFlags {
	private:
	  uint32_t Value;
}

Implement TargetRelativeDirectPointer

For the relative address TargetRelativeDirectPointer, we search for TargetRelativeDirectPointer from the source code to find that TargetRelativeDirectPointer is RelativeDirectPointer.

template <typename Runtime, typename Pointee, bool Nullable = true>
using TargetRelativeDirectPointer
  = typename Runtime::template RelativeDirectPointer<Pointee, Nullable>;

Then at RelativePointer.h finds RelativeDirectPointer and finds that RelativeDirectPointerinherits from the base class RelativeDirectPointerImpl, which contains an attribute, RelativeOffset. It also contains a way to get real memory by offset.

template <typename T, bool Nullable = true, typename Offset = int32_t,
          typename = void>
class RelativeDirectPointer;

/// A direct relative reference to an object that is not a function pointer.
// offset passed in Int32
template <typename T, bool Nullable, typename Offset>
class RelativeDirectPointer<T, Nullable, Offset,
    typename std::enable_if<!std::is_function<T>::value>::type>
    : private RelativeDirectPointerImpl<T, Nullable, Offset>
{
	......
}

/// A relative reference to a function, intended to reference private metadata
/// functions for the current executable or dynamic library image from
/// position-independent constant data.
template<typename T, bool Nullable, typename Offset>
class RelativeDirectPointerImpl {
	private:
  /// The relative offset of the function's entry point from *this.
  Offset RelativeOffset;
  ......
  // Generic T type is also returned by offset calculation
  PointerTy get() const & {
    // Check for null.
    if (Nullable && RelativeOffset == 0)
      return nullptr;
    
    // The value is addressed relative to `this`.
    uintptr_t absolute = detail::applyRelativeOffset(this, RelativeOffset);
    return reinterpret_cast<PointerTy>(absolute);
  }
  ......
}

/// Apply a relative offset to a base pointer. The offset is applied to the base
/// pointer using sign-extended, wrapping arithmetic.
// Calculate by offset
template<typename BasePtrTy, typename Offset>
static inline uintptr_t applyRelativeOffset(BasePtrTy *basePtr, Offset offset) {
  static_assert(std::is_integral<Offset>::value &&
                std::is_signed<Offset>::value,
                "offset type should be signed integer");

  auto base = reinterpret_cast<uintptr_t>(basePtr);
  // We want to do wrapping arithmetic, but with a sign-extended
  // offset. To do this in C, we need to do signed promotion to get
  // the sign extension, but we need to perform arithmetic on unsigned values,
  // since signed overflow is undefined behavior.
  auto extendOffset = (uintptr_t)(intptr_t)offset;
  // Pointer Address+Stored offset Address--Memory Shift Get Value
  return base + extendOffset;
}

Then we can structure the TargetRelativeDirectPointer:

// Incoming generic Pointee
struct TargetRelativeDirectPointer<Pointee> {
    var offset: Int32
    
    // Calculate memory by offset
    mutating func getmeasureRelativeOffset() -> UnsafeMutablePointer<Pointee> {
        let offset = self.offset
        
        return withUnsafePointer(to: &self) { p in
        	// Use advanced offset and rebind to Pointee type
            return UnsafeMutablePointer(mutating: UnsafeRawPointer(p).advanced(by: numericCast(offset)).assumingMemoryBound(to: Pointee.self))
        }
    }
}

At the same time, we can modify the TargetStructDescriptor to be:

struct TargetStructDescriptor {
	// A flag for describing the context, including kind and version
    var flags: Int32
    // The context used to represent the parent class, or NULL if it is at the top level, if there is no parent
    var parent: Int32// Temporarily defined as Int32 because it does not resolve
    // Name of type
    var name: TargetRelativeDirectPointer<CChar>
    // Pointer to this type of metadata access function
    var accessFunctionPointer: TargetRelativeDirectPointer<UnsafeRawPointer>
    // A pointer to a field descriptor of a type
    var fieldDescriptor: TargetRelativeDirectPointer<FieldDescriptor>
    // count of record attributes
    var numFields: Int32
    // Record attribute offset in metadata
    var fieldOffsetVectorOffset: Int32
}

// TargetRelativeContextPointer is temporarily unresolved and can be temporarily resolved to Int32 by source analysis
template<typename Runtime,
         template<typename _Runtime> class Context = TargetContextDescriptor>
using TargetRelativeContextPointer =
  RelativeIndirectablePointer<const Context<Runtime>,
                              /*nullable*/ true, int32_t,
                              TargetSignedContextPointer<Runtime, Context>>;

FieldDescriptor and FieldRecord

Next, we start parsing the FieldDescriptor, which is in the source code as follows:

// Field descriptors contain a collection of field records for a single
// class, struct or enum declaration.
class FieldDescriptor {
  const FieldRecord *getFieldRecordBuffer() const {
    return reinterpret_cast<const FieldRecord *>(this + 1);
  }

public:
  const RelativeDirectPointer<const char> MangledTypeName;
  const RelativeDirectPointer<const char> Superclass;

  FieldDescriptor() = delete;

  const FieldDescriptorKind Kind;
  const uint16_t FieldRecordSize;
  const uint32_t NumFields;
  ......
  // Get all properties, each encapsulated in FieldRecord
  llvm::ArrayRef<FieldRecord> getFields() const {
    return {getFieldRecordBuffer(), NumFields};
  }
  ......
}

// FieldDescriptorKin is Int16
enum class FieldDescriptorKind : uint16_t {
	......
}

The structure of FieldRecord in the source code is:

class FieldRecord {
  const FieldRecordFlags Flags;

public:
  const RelativeDirectPointer<const char> MangledTypeName;
  const RelativeDirectPointer<const char> FieldName;
  ......
}

// Field records describe the type of a single stored property or case member
// of a class, struct or enum.
// FieldRecordFlags is Int32
class FieldRecordFlags {
  using int_type = uint32_t;
  ......
}

fieldOffsetVectorOffset calculates offset

Finally, there is a calculation of the fieldOffsetVectorOffset, which records the offset of the attribute in the metadata, to get the offset of the attribute in the metadata. The information available from the source code is:

// StoredPointer is Int32 and returns an Int32
  /// Get a pointer to the field offset vector, if present, or null.
  const StoredPointer *getFieldOffsets() const {
    assert(isTypeMetadata());
    auto offset = getDescription()->getFieldOffsetVectorOffset();
    if (offset == 0)
      return nullptr;
    auto asWords = reinterpret_cast<const void * const*>(this);
    return reinterpret_cast<const StoredPointer *>(asWords + offset);
  }

But to process with this logic, the data is not correct, so I found this from the source of HandyJSON:

// The 64-bit offset was then multiplied by 2
return Int(UnsafePointer<Int32>(pointer)[vectorOffset * (is64BitPlatform ? 2 : 1) + $0])

At this point, we have a fairly clear structure line, as follows:

// Calculate memory address by offset into generic Pointee
struct TargetRelativeDirectPointer<Pointee> {
    var offset: Int32
    
    // Calculate memory by offset
    mutating func getmeasureRelativeOffset() -> UnsafeMutablePointer<Pointee> {
        let offset = self.offset
        
        return withUnsafePointer(to: &self) { p in
        	// Use advanced offset and rebind to Pointee type
            return UnsafeMutablePointer(mutating: UnsafeRawPointer(p).advanced(by: numericCast(offset)).assumingMemoryBound(to: Pointee.self))
        }
    }
}

struct TargetStructMetadata {
    var kind: Int
    var typeDescriptor: UnsafeMutablePointer<TargetStructDescriptor>
}


struct TargetStructDescriptor {
    var flags: Int32
    var parent: Int32
    var name: TargetRelativeDirectPointer<CChar>
    var accessFunctionPointer: TargetRelativeDirectPointer<UnsafeRawPointer>
    var fieldDescriptor: TargetRelativeDirectPointer<FieldDescriptor>
    var numFields: Int32
    var fieldOffsetVectorOffset: Int32
    
    func getFieldOffsets(_ metadata: UnsafeRawPointer) -> UnsafePointer<Int32> {
        print(metadata)
        return metadata.assumingMemoryBound(to: Int32.self).advanced(by: numericCast(self.fieldOffsetVectorOffset) * 2)
    }
    
    // Use when calculating metatypes
    var genericArgumentOffset: Int {
        return 2
    }
}

struct FieldDescriptor {
    var MangledTypeName: TargetRelativeDirectPointer<CChar>
    var Superclass: TargetRelativeDirectPointer<CChar>
    var kind: UInt16
    var fieldRecordSize: Int16
    var numFields: Int32
    var fields: FieldRecordBuffer<FieldRecord>
}

struct FieldRecord {
    var fieldRecordFlags: Int32
    var mangledTypeName: TargetRelativeDirectPointer<CChar>
    var fieldName: TargetRelativeDirectPointer<UInt8>
}

// Get FieldRecord
struct FieldRecordBuffer<Element> {
    var element: Element
    
    mutating func buffer(n: Int) -> UnsafeBufferPointer<Element> {
        return withUnsafePointer(to: &self) {
            let ptr = $0.withMemoryRebound(to: Element.self, capacity: 1) { start in
                return start
            }
            return UnsafeBufferPointer(start: ptr, count: n)
        }
    }
    
    mutating func index(of i: Int) -> UnsafeMutablePointer<Element> {
        return withUnsafePointer(to: &self) {
            return UnsafeMutablePointer(mutating: UnsafeRawPointer($0).assumingMemoryBound(to: Element.self).advanced(by: i))
        }
    }
}

Code validation

Here's the code to verify this structure.

protocol BrigeProtocol {}

extension BrigeProtocol {
	// Return through protocol rebind type
    static func get(from pointor: UnsafeRawPointer) -> Any {
    	// Self is the real type
        pointor.assumingMemoryBound(to: Self.self).pointee
    }
}

struct BrigeMetadataStruct {
    let type: Any.Type
    let witness: Int
}

func custom(type: Any.Type) -> BrigeProtocol.Type {
    let container = BrigeMetadataStruct(type: type, witness: 0)
    let cast = unsafeBitCast(container, to: BrigeProtocol.Type.self)
    return cast
}
// LLPerson Structures
struct LLPerson {
    var age: Int = 18
    var name: String = "LL"
    var nameTwo: String = "LLLL"
}
// Create an instance
var p = LLPerson()
// LLPerson's metadata is bit-wise inserted into the TargetStructMetadata metadata, LLPerson.self is Unsafe MutablePointer<TargetStructMetadata>. Self
let ptr = unsafeBitCast(LLPerson.self as Any.Type, to: UnsafeMutablePointer<TargetStructMetadata>.self)

// Get the structure name
let namePtr = ptr.pointee.typeDescriptor.pointee.name.getmeasureRelativeOffset()
print("current struct name: \(String(cString: namePtr))")
// Get the number of attributes
let numFields = ptr.pointee.typeDescriptor.pointee.numFields
print("Current number of class attributes: \(numFields)")

// Get the offset from the property to the metadata
let offsets = ptr.pointee.typeDescriptor.pointee.getFieldOffsets(UnsafeRawPointer(ptr).assumingMemoryBound(to: Int.self))

print("----------- start fetch field -------------")

for i in 0..<numFields {
    // Get Property Name
    let fieldName = ptr.pointee.typeDescriptor.pointee.fieldDescriptor.getmeasureRelativeOffset().pointee.fields.index(of: Int(i)).pointee.fieldName.getmeasureRelativeOffset()
    print("----- field \(String(cString: fieldName))  -----")

    // Get the offset of the property by byte
    let fieldOffset = offsets[Int(i)]
    print("\(String(cString: fieldName)) The offset of is:\(fieldOffset)byte")
    // This is a swift mixed-up type name that needs to be converted to a real type name
    let typeMangleName = ptr.pointee.typeDescriptor.pointee.fieldDescriptor.getmeasureRelativeOffset().pointee.fields.index(of: Int(i)).pointee.mangledTypeName.getmeasureRelativeOffset()
//    print("\(String(cString: typeMangleName))")
    let genericVector = UnsafeRawPointer(ptr).advanced(by: ptr.pointee.typeDescriptor.pointee.genericArgumentOffset * MemoryLayout<UnsafeRawPointer>.size).assumingMemoryBound(to: Any.Type.self)
    // This library function swift_is required GetTypeByMangledNameInContext passes four parameters
    let fieldType = swift_getTypeByMangledNameInContext(
        typeMangleName, // Blended Name
        256,            // The length of the name information after mixing, requires calculating direct 256 in HandyJSON
        UnsafeRawPointer(ptr.pointee.typeDescriptor), // In context typeDescriptor
        UnsafeRawPointer(genericVector).assumingMemoryBound(to: Optional<UnsafeRawPointer>.self)) //Current generic parameter restores symbol information

    // Bitwise fieldType into Any
    let type = unsafeBitCast(fieldType, to: Any.Type.self)
    // Get our true type of information through protocol bridging
    let value = custom(type: type)

    //The pointer to get the instance object p needs to be converted to Unsafe RawPointerand bound to 1 byte, the Int8 type.
    //Since the offset is then calculated in bytes, it will be offset by the length of the structure without conversion
    let instanceAddress = withUnsafePointer(to: &p){return UnsafeRawPointer($0).assumingMemoryBound(to: Int8.self)}

    print("fieldTyoe: \(type) \nfieldValue: \(value.get(from: instanceAddress.advanced(by: Int(fieldOffset))))")
}

print("----------- end fetch field -------------")

Print information:

From the memory address, we can also see the layout information of the attributes.

Topics: Swift iOS JSON