Swift advanced Protocol

Posted by cloudy243 on Fri, 26 Nov 2021 07:55:26 +0100

1. Preface

This article mainly explains the protocol commonly used in Swift, and mainly analyzes the usage and underlying storage structure of protocol.

2. Basic usage

Let's take a look at the basic usage of the protocol in Swift (which is not different from OC) 👇

 2.1 syntax format

Syntax format of the protocol 👇

protocol MyProtocol {
    // body
}
  • class, struct and enum can comply with the protocol. If multiple protocols need to be complied with, they can be separated by commas, for example 👇
struct LGTeacher: Protocol1, Protocol2 {
    // body
}
  • If there is a superClass in the class, it is generally placed before the agreement to be observed 👇
struct LGTeacher: NSObject, Protocol1, Protocol2 {
    // body
}

Η 2.2 properties in the protocol

Let's look at the properties in the protocol. There are two points to note 👇

  1. The protocol also requires that an attribute must be clearly readable / readable and writable
  2. The attribute is required to be defined as a variable type, that is, var is used instead of let
protocol LGTestProtocol {
    var age: Int {get set}
}

⒌ 2.3 methods in the agreement

Finally, look at the methods in the protocol. Like OC, you only need to declare without implementation. for example 👇

protocol MyProtocol {
    func doSomething()
    static func teach()
}

Then the class follows the protocol and must implement the methods in the protocol 👇

class LGTeacher: MyProtocol{
    func doSomething() {
        print("LGTeacher doSomething")
    }
    
    static func teach() {
        print("LGTeacher teach")
    }
}
var t = LGTeacher()
t.doSomething()
LGTeacher.teach()
  • Initialization methods can also be defined in the protocol. When implementing the initializer, the required keyword must be used (not required for OC) 👇
protocol MyProtocol {
    init(age: Int)
}
class LGTeacher: MyProtocol {
    var age: Int
    required init(age: Int) {
        self.age = age
    }
}
  • If a protocol can only be implemented by a class, the protocol needs to inherit AnyObject. If the structure complies with the agreement at this time, an error will be reported! 👇

3. Advanced usage

Advanced usage of protocol 👉 There are three main situations in which an agreement is used as a type 👇

  1. As a parameter type or return value in a function, method, or initializer
  2. Type as a constant, variable, or property
  3. As the type of an element Item in an array, dictionary, or other container

ⅶ 3.1 mode of inheritance

First, what is the output of the following code? 👇

class Shape{
    var area: Double{
        get{
            return 0
        }
    }
}
class Circle: Shape{
    var radius: Double
   
    init(_ radius: Double) {
        self.radius = radius
    }
    
    override var area: Double{
        get{
            return radius * radius * 3.14
        }
    }
}
class Rectangle: Shape{
    var width, height: Double
    init(_ width: Double, _ height: Double) {
        self.width = width
        self.height = height
    }
    
    override var area: Double{
        get{
            return width * height
        }
    }
}

var circle: Shape = Circle.init(10.0)
var rectangle: Shape = Rectangle.init(10.0, 20.0)

var shapes: [Shape] = [circle, rectangle]
for shape in shapes{
    print(shape.area)
}

The above code is implemented based on inheritance, and the area in the base class must have a default implementation. Of course, this situation can also be realized by protocol 👇

⒌ 3.1 mode of agreement

protocol Shape {
    var area: Double {get}
}
class Circle: Shape{
    var radius: Double

    init(_ radius: Double) {
        self.radius = radius
    }

    var area: Double{
        get{
            return radius * radius * 3.14
        }
    }
}
class Rectangle: Shape{
    var width, height: Double
    init(_ width: Double, _ height: Double) {
        self.width = width
        self.height = height
    }

    var area: Double{
        get{
            return width * height
        }
    }
}

var circle: Shape = Circle.init(10.0)
var rectangle: Shape = Rectangle.init(10.0, 20.0)

var shapes: [Shape] = [circle, rectangle]
for shape in shapes{
    print(shape.area)
}

shape becomes a protocol and provides a read-only attribute area. All classes that follow the protocol should implement the get method of age. Then let's look at var shapes 👉 There are two kinds of elements in it 👇

  1. When the Shape specified by the element is a class, the addresses of reference types are stored in the array (this is well understood and no problem)
  2. When the Shape specified by the element is a protocol, what is stored in the array?

How to make the elements in the array shapes protocol? 👉 Let the protocol implement the get method of area by default 👇

protocol Shape {

}

extension Shape{
    var area: Double {
        get{return 0}
    }
}

Then, let's call it this way to see what the output is? 👇

var circle: Shape = Circle.init(10.0)
print(circle.area)

The output is 0.0, why not 10 * 10 * 3.14? Because the method declared in the extension of the protocol Shape is called statically, the address of the code is determined during compilation and cannot be changed. We can verify this with SIL code 👇

  • First look at the main function
  • Let's take another look at the get method of area implemented in the shape extension protocol 👇

It can be seen from the SIL code above that although 10.0 is passed in the initialization of Circle.init(10.0), Builtin.FPIEEE64 is used in the initialization of SIL code, and Builtin.FPIEEE64 happens to be the return value (i.e. 0) of the get method of area implemented in the extension of shape protocol. Finally, let's practice and look at the source code of circle.area method 👇

Also called is $Builtin.FPIEEE64 👉 0.0, so the output of print(circle.area) is of course 0.0.

4. Bottom layer principle

Let's take a look at the output of the following case? 👇

protocol MyProtocol {
    func teach()
}
extension MyProtocol{
    func teach(){ print("MyProtocol") }
}
class MyClass: MyProtocol{
    func teach(){ print("MyClass") }
}
let object: MyProtocol = MyClass()
object.teach()
let object1: MyClass = MyClass()
object1.teach()

Why is the output the same? Old rules, from SIL analysis 👇

 4.1 example SIL analysis

  • Part I 👉 Definition of MyProtocol and MyClass
  • Part II 👉 Call in main function

From the picture above, we know 👇

  1. Object object 👉 The method teach is called through witness_method call
  2. Object object1 👉 The method teach is called through class_method call

Then we search for #MyProtocol.teach and #MyClass.teach in the SIL code respectively 👇

Two method lists were found: the SIL of MyClass_ VTable and sil_witness_table:

  1. sil_ We are familiar with VTable, which was mentioned in the previous Swift value type reference type & method scheduling article. It is the function list of class MyClass
  2. sil_witness_table corresponds to the protocol witness table (PWT for short). It stores a method array, which contains the pointer address of the method implementation. Generally, when we call a method, we find it by obtaining the memory address of the object and the displacement offset of the method.

And sil_ witness_ The table actually calls the coach method of MyClass 👇

This is why object.teach() outputs MyClass.

Extension: remove the method declared in the Protocol
//What if the statement in the agreement is removed? What is the print result
protocol MyProtocol {
}
extension MyProtocol{
    func teach(){ print("MyProtocol") }
}
class MyClass: MyProtocol{
    func teach(){ print("MyClass") }
}
let object: MyProtocol = MyClass()
object.teach()

let object1: MyClass = MyClass()
object1.teach()

Continue SIL analysis 👇

  • MyProtocol has no declaration of the teach function 👇
  • main function call

As shown in the figure above 👇

  1. The first one to print MyProtocol is because it calls the teach method in the protocol extension. The address of this method has been determined at the compilation time, that is, it is scheduled through the address of the static function
  2. The second print MyClass, as in the previous example, is the function table call of the class
  • Method list

As shown in the figure above, check the witness in SIL_ Table, in which there is no teach method, because 👇

  1. The methods declared in the Protocol will be stored in PWT at the bottom, and the methods in PWT are also stored through class_method to find the scheduling of the corresponding method in the V-Table of the class.
  2. If there is no function declared in the Protocol, only a default implementation is provided through the Extension, and its function address has been determined during compilation. For classes that comply with the Protocol, this method cannot be overridden.

 4.2 PWT storage location of the protocol

When analyzing function scheduling, we already know that V-Table is stored in metadata, and according to the above analysis, the methods in the protocol are stored in PWT. Where is PWT stored? Next, let's explore. First, let's take a look at the following example. What is the output? 👇

protocol Shape {
    var area: Double {get}
}
class Circle: Shape{
    var radius: Double

    init(_ radius: Double) {
        self.radius = radius
    }

    var area: Double{
        get{
            return radius * radius * 3.14
        }
    }
}

var circle: Shape = Circle(10.0)
print(MemoryLayout.size(ofValue: circle))
print(MemoryLayout.stride(ofValue: circle))

var circle1: Circle = Circle(10.0)
print(MemoryLayout.size(ofValue: circle1))
print(MemoryLayout.stride(ofValue: circle1))

The type of Circle is protocol Shape, while the type of circle1 is class Circle, which outputs the result 👇

The size and stripe of the circle are both 40, why?

  • First lldb look 👇

In the metadata address of the first address of the circle, the value of 10 is saved in heapObject.

  • Then look at SIL (main function code) 👇

We found that in SIL, the system reads the previously declared circle variable by calling init_existential_addr, while circle1 is 👇

The circle is read by calling the load command. What does the init_existing_addr command mean? We can find it in the description document on the SIL official website 👇

The existential container in the figure above is a special data type generated by the compiler and is also used to manage protocol types that comply with the same protocol. Because these data types have different memory space sizes, storage consistency can be achieved by using the existing container for management.

Therefore, the system uses the existing container to contain the Shape type, and then calls the existing container to initialize the circle variable, which is equivalent to wrapping the circle. Then, the focus comes to the existing container. Next, let's look at the data format stored in this container through IR code?

  • Continue to view IR codes 👇
  • Then look at the main function code 👇

That is, as like as two peas, heapObject, metadata, PWT}, the final structure is exactly the same as the memory distribution that lldb saw before.

Imitation writing

Next, we can try to imitate the memory binding process and code of IR's main function 👇

// HeapObject structure (essence of Swift class)
struct HeapObject {
    var type: UnsafeRawPointer
    var refCount1: UInt32
    var refCount2: UInt32
}
// %T4main5ShapeP = type { [24 x i8], %swift.type*, i8** }
struct protocolData {
    //24 * i8: because it is read by 8 bytes, it is written as 3 pointers, exactly 24 bytes
    var value1: UnsafeRawPointer
    var value2: UnsafeRawPointer
    var value3: UnsafeRawPointer
    //type stores the metadata in order to find the Value Witness Table value directory table
    var type: UnsafeRawPointer
    // i8 * stores pwt, that is, the method list of the protocol
    var pwt: UnsafeRawPointer
}
// 2. Define protocol + class
protocol Shape {
    var area: Double {get}
}
class Circle: Shape{
    var radius: Double

    init(_ radius: Double) {
        self.radius = radius
    }

    var area: Double{
        get{
            return radius * radius * 3.14
        }
    }
}
//The object type is protocol
var circle: Shape = Circle(10.0)

// 3. Strong conversion of circle to protocol data structure
withUnsafePointer(to: &circle) { ptr in
    ptr.withMemoryRebound(to: protocolData.self, capacity: 1) { pointer in
        print(pointer.pointee)
    }
}

function 👇

So far, we know the storage location of PWT 👇

It is stored in an existing container. The approximate structure of the container is {heapObject, metadata, PWT}

Modification 1: change class to struct

We define a structure Rectangle, which also follows the Shape protocol 👇

protocol Shape {
    var area: Double {get}
}
struct Rectangle: Shape{
    var width, height: Double
    init(_ width: Double, _ height: Double) {
        self.width = width
        self.height = height
    }

    var area: Double{
        get{
            return width * height
        }
    }
}
//The object type is protocol
var rectangle: Shape = Rectangle(10.0, 20.0)

struct HeapObject {
    var type: UnsafeRawPointer
    var refCount1: UInt32
    var refCount2: UInt32
}
// %T4main5ShapeP = type { [24 x i8], %swift.type*, i8** }
struct protocolData {
    //24 * i8: because it is read by 8 bytes, it is written as 3 pointers
    var value1: UnsafeRawPointer
    var value2: UnsafeRawPointer
    var value3: UnsafeRawPointer
    //type stores the metadata in order to find the Value Witness Table value directory table
    var type: UnsafeRawPointer
    // i8 * storage pwt
    var pwt: UnsafeRawPointer
}

//Strong conversion of circle to protocol data structure
withUnsafePointer(to: &rectangle) { ptr in
    ptr.withMemoryRebound(to: protocolData.self, capacity: 1) { pointer in
        print(pointer.pointee)
    }
}

Rectangle has two members, width and height, so value1 and value2 in protocol data store their values respectively 👇

Next, let's look at how it is handled in IR code 👇

As can be seen from the above figure, the% 4 corresponding to width starts from 0 and stores 8 bytes offset, which is 0 ~ 7, while the 5% corresponding to height starts from 1, which is 8 ~ 15. (if Rectangle is a class, it should be stored in 0 ~ 7, because it stores heapobject.)

Modification 2: there are three attributes in struct

Continue to modify and add another attribute to become three attributes? 👇

struct Rectangle: Shape{
    var width, height: Double
    var width1 = 30.0
    init(_ width: Double, _ height: Double) {
        self.width = width
        self.height = height
    }

    var area: Double{
        get{
            return width * height
        }
    }
}

As can be seen from the results, width1 is stored in value3.

Modification 3: there are four attributes in struct

Go on, what about the four attributes? 👇

struct Rectangle: Shape{
    var width, height: Double
    var width1 = 30.0
    var height1 = 40.0
    init(_ width: Double, _ height: Double) {
        self.width = width
        self.height = height
    }

    var area: Double{
        get{
            return width * height
        }
    }
}

Let's look at the address of value1 👇

Summary

Therefore, Protocol protocol is based on the underlying storage structure 👇

  1. The first 24 bytes are mainly used to store the attribute values of class/struct following the protocol. If the 24 bytes are not enough, a memory space will be opened in the heap area, and then the heap address will be stored in the first 8 bytes of the 24 bytes (if the 24 bytes are exceeded, the heap space will be allocated directly, and then the value will be stored instead of storing the value first, and then allocating the heap space if it is found to be insufficient)
  2. The last 16 bytes are used to store vwt (value directory table) and pwt (protocol directory table) respectively

 4.3 copy on write

Continue to modify the example, change the rectangle to class, and declare an array to store the circle and rectangle objects 👇

protocol Shape {
    var area: Double {get}
}
class Circle: Shape{
    var radius: Double

    init(_ radius: Double) {
        self.radius = radius
    }

    var area: Double{
        get{
            return radius * radius * 3.14
        }
    }
}
class Rectangle: Shape{
    var width, height: Double
    init(_ width: Double, _ height: Double) {
        self.width = width
        self.height = height
    }

    var area: Double{
        get{
            return width * height
        }
    }
}

var circle: Shape = Circle.init(10.0)
var rectangle: Shape = Rectangle.init(10.0, 20.0)

var shapes: [Shape] = [circle, rectangle]

for shape in shapes{
    print(shape.area)
}

We know that pwt is stored in the protocol, and the interior of pwt is also through class_method lookup: in the process of code running, the bottom layer associates the metadata and pwt through the container structure, so you can find the corresponding v-table according to the metadata to complete the method call. Therefore, 314 and 200 output in the figure above illustrate 👉 The system is to find the get method of the attribute area in their respective classes.

Look at the following example 👇 (restore Rectangle back to the structure, and then declare a variable rectangle1 = rectangle)

struct Rectangle: Shape{
    var width, height: Double
    var width1 = 30.0
    var height1 = 40.0
    init(_ width: Double, _ height: Double) {
        self.width = width
        self.height = height
    }

    var area: Double{
        get{
            return width * height
        }
    }
}

//The object type is protocol
var rectangle: Shape = Rectangle(10.0, 20.0)
//Assign it to another protocol variable
var rectangle1: Shape  = rectangle

Then use withMemoryRebound to bind the value to the structure protocol data to view the memory 👇

// View its memory address
struct HeapObject {
    var type: UnsafeRawPointer
    var refCount1: UInt32
    var refCount2: UInt32
}
// %T4main5ShapeP = type { [24 x i8], %swift.type*, i8** }
struct protocolData {
    //24 * i8: because it is read by 8 bytes, it is written as 3 pointers
    var value1: UnsafeRawPointer
    var value2: UnsafeRawPointer
    var value3: UnsafeRawPointer
    //type stores the metadata in order to find the Value Witness Table value directory table
    var type: UnsafeRawPointer
    // i8 * storage pwt
    var pwt: UnsafeRawPointer
}

withUnsafePointer(to: &rectangle) { ptr in
    ptr.withMemoryRebound(to: protocolData.self, capacity: 1) { pointer in
        print(pointer.pointee)
    }
}

withUnsafePointer(to: &rectangle1) { ptr in
    ptr.withMemoryRebound(to: protocolData.self, capacity: 1) { pointer in
        print(pointer.pointee)
    }
}

From the output as like as two peas, the two protocol variables rectangle and rectangle1 memory address are exactly the same. If you modify the value of the width attribute of rectangle1 (you need to declare the width attribute to the protocol) 👇

protocol Shape {
    var width: Double {get set}
    var area: Double {get}
}

Calling code 👇

withUnsafePointer(to: &rectangle) { ptr in
    ptr.withMemoryRebound(to: protocolData.self, capacity: 1) { pointer in
        print(pointer.pointee)
    }
}
withUnsafePointer(to: &rectangle1) { ptr in
    ptr.withMemoryRebound(to: protocolData.self, capacity: 1) { pointer in
        print(pointer.pointee)
    }
}

rectangle1.width = 50.0
withUnsafePointer(to: &rectangle1) { ptr in
    ptr.withMemoryRebound(to: protocolData.self, capacity: 1) { pointer in
        print(pointer.pointee)
    }
}

Before modification, the heapobject of rectangle and rectangle 1, that is, value1, is the same 0x00000001005421b0. After modification, the heapobject of rectangle 1 becomes 0x0000000100611720. Here, the struct value type is verified (although more than 24 bytes are stored on the heap) [assignment on write] 👇

When copying, there is no value modification, so the two variables point to the same heap memory. When the second variable modifies the attribute value, the value of the original heap memory will be copied to a new heap memory and the value will be modified

What happens if you change the struct value type to the class reference type?

class Rectangle: Shape{
    var width: Double
    var height: Double
    var width1 = 30.0
    var height1 = 40.0
    init(_ width: Double, _ height: Double) {
        self.width = width
        self.height = height
    }

    var area: Double{
        get{
            return width * height
        }
    }
}

As shown in the figure above, the address has not changed before and after modification!

Value Buffer
  • The official name of 24 bytes in struct structure is Value Buffer.
  • The Value Buffer is used to store the current value. If it exceeds the maximum storage capacity, it will open up a heap space.
  • For value types, the heapobject address (Copy on write) will be copied first during assignment. When modifying, the reference count will be detected first. If the reference count is greater than 1, a new heap space will be opened to copy the content to be modified to the new heap space (this is to improve performance).

Position of Value Buffer in the container existing container 👇

summary

This article explains an important concept in Swift 👉 Protocol explains in detail the memory address distribution of PWT and Value Buffer when the value type struct and reference type class follow the protocol from the main line of basic concept, usage, advanced usage and bottom layer. I hope you can master it and deal with the interview calmly.