Deep Understanding of Objective-C: Category

Posted by m7_b5 on Tue, 25 Jun 2019 01:51:07 +0200

abstract

No matter how perfect a class design is, there may be some unpredictable situations in the future demand evolution. So how do you extend existing classes? Generally speaking, inheritance and combination are good choices. However, in Objective-C 2.0, category is also provided, which can dynamically add new behaviors to existing classes. Category is now available in every corner of Objective-C code, from Apple's official framework to open source frameworks, from large and complex APP s to simple applications, catagory is everywhere. In this paper, category has done a more comprehensive collation, hoping to benefit readers.

brief introduction

The author of this paper is from the iOS R&D group of the Mituan Hotel Tourism Group. We are committed to creating value, improving efficiency and pursuing excellence. Welcome to join us (resume please send to mailbox majia03@meituan.com).
This paper is a compilation of the runtime source code of Objective-C. It mainly analyses the implementation principle of category in runtime layer and various aspects related to category, including:

A Brief Introduction to Baodi-category
Continuous Analogy - category and extension
Overview of Light-category
Tracing Back to the Source - How to Load category
Side branch end-leaf-category and + load method
Analog bypass-category and method coverage
A further layer - category and associated objects

1. A Brief Introduction to New Treasure Land-category

Category is a language feature added after Objective-C 2.0, and its main function is to add methods to existing classes. In addition, apple recommends two other use scenarios for category 1

Class implementations can be separated into several different files. There are several obvious benefits, a) reducing the volume of a single file b) organizing different functions into different categories c) completing a class D by multiple developers together, loading the desired categories on demand, and so on.
Declaring Private Method

However, in addition to the usage scenarios recommended by apple, many developers have opened their brains, and several other usage scenarios of category have been derived:

Simulated multiple inheritance
Publish the framework's private methods

This language feature of Objective-C may not matter much to purely dynamic languages, such as javascript, where you can add arbitrary methods and instance variables to a "class" or object at any time. But for languages that are not so dynamic, this is a remarkable feature.

2. Continuous Analogy - category and extension

Extension looks like an anonymous category, but extensions and named categories are almost two things. Extension is part of a class at compile time. It forms a complete class at compile time with @interface in the header file and @implement in the implementation file. It generates and dies out with the generation of classes. Extensions are generally used to hide private information about classes. You have to have a source code for a class to add an extension to a class, so you can't add an extension to a system class such as NSString. (See details) 2)

But category is totally different. It is decided in the run-time.
From the point of view of the difference between category and extension, we can deduce an obvious fact that extension can add instance variables, while category can not add instance variables (because at run time, the memory layout of the object has been determined, and adding instance variables will destroy the internal layout of the class, which is disastrous for compiled languages).

3. Overview of Light-category

We know that all OC classes and objects are represented by struct in the runtime layer, and category is no exception. In the runtime layer, category uses the structured category_t (which can be found in objc-runtime-new.h), which contains
1) Class name
2) Classes (cls)
3) List of all instance methods added to classes in category
4) A list of all added class methods in category
5) List of all protocols implemented by category
6) All the properties added to category (instance Properties)

typedef struct category_t {
    const char *name;
    classref_t cls;
    struct method_list_t *instanceMethods;
    struct method_list_t *classMethods;
    struct protocol_list_t *protocols;
    struct property_list_t *instanceProperties;
} category_t;

From the definition of category, we can also see that category can be (you can add instance methods, class methods, even implementations of protocols, add attributes) and not (you can not add instance variables).
ok, let's write a category first and see what category is.

MyClass.h:

#import <Foundation/Foundation.h>

@interface MyClass : NSObject

- (void)printName;

@end

@interface MyClass(MyAddition)

@property(nonatomic, copy) NSString *name;

- (void)printName;

@end

MyClass.m:

#import "MyClass.h"

@implementation MyClass

- (void)printName
{
    NSLog(@"%@",@"MyClass");
}

@end

@implementation MyClass(MyAddition)

- (void)printName
{
    NSLog(@"%@",@"MyAddition");
}

@end

Let's use clang's command to see what category will become.

clang -rewrite-objc MyClass.m

Well, we got a 3M size, 10w multi line.cpp file (which is definitely Apple worth tucking away). We ignored everything we had nothing to do. At the end of the document, we found the following code snippet:

static struct /*_method_list_t*/ {
unsigned int entsize;  // sizeof(struct _objc_method)
unsigned int method_count;
struct _objc_method method_list[1];
} _OBJC_$_CATEGORY_INSTANCE_METHODS_MyClass_$_MyAddition __attribute__ ((used, section ("__DATA,__objc_const"))) = {
sizeof(_objc_method),
1,
{{(struct objc_selector *)"printName", "v16@0:8", (void *)_I_MyClass_MyAddition_printName}}
};

static struct /*_prop_list_t*/ {
unsigned int entsize;  // sizeof(struct _prop_t)
unsigned int count_of_properties;
struct _prop_t prop_list[1];
} _OBJC_$_PROP_LIST_MyClass_$_MyAddition __attribute__ ((used, section ("__DATA,__objc_const"))) = {
sizeof(_prop_t),
1,
{{"name","T@\"NSString\",C,N"}}
};

extern "C" __declspec(dllexport) struct _class_t OBJC_CLASS_$_MyClass;

static struct _category_t _OBJC_$_CATEGORY_MyClass_$_MyAddition __attribute__ ((used, section ("__DATA,__objc_const"))) =
{
"MyClass",
0, // &OBJC_CLASS_$_MyClass,
(const struct _method_list_t *)&_OBJC_$_CATEGORY_INSTANCE_METHODS_MyClass_$_MyAddition,
0,
0,
(const struct _prop_list_t *)&_OBJC_$_PROP_LIST_MyClass_$_MyAddition,
};
static void OBJC_CATEGORY_SETUP_$_MyClass_$_MyAddition(void ) {
_OBJC_$_CATEGORY_MyClass_$_MyAddition.cls = &OBJC_CLASS_$_MyClass;
}
#pragma section(".objc_inithooks$B", long, read, write)
__declspec(allocate(".objc_inithooks$B")) static void *OBJC_CATEGORY_SETUP[] = {
(void *)&OBJC_CATEGORY_SETUP_$_MyClass_$_MyAddition,
};
static struct _class_t *L_OBJC_LABEL_CLASS_$ [1] __attribute__((used, section ("__DATA, __objc_classlist,regular,no_dead_strip")))= {
&OBJC_CLASS_$_MyClass,
};
static struct _class_t *_OBJC_LABEL_NONLAZY_CLASS_$[] = {
&OBJC_CLASS_$_MyClass,
};
static struct _category_t *L_OBJC_LABEL_CATEGORY_$ [1] __attribute__((used, section ("__DATA, __objc_catlist,regular,no_dead_strip")))= {
&_OBJC_$_CATEGORY_MyClass_$_MyAddition,
};

We can see that,
First, the compiler generates the list of instance methods OBJC$_CATEGORY_INSTANCE_METHODSMyClass$_MyAddition and the list of attributes OBJC$_PROP_LISTMyClass$_MyAddition, both of which follow the common prefix + class name + category name, and the list of instance methods is filled with the method printName we wrote in MyAddition, while the attribute column is filled in. The table is filled with the name attribute that we added in MyAddition. Another fact to note is that the name of category is used to name various lists and the subsequent category structure itself, and it is modified by static, so our category name cannot be repeated in the same compilation unit, otherwise compilation errors will occur.
Secondly, the compiler generates the category itself OBJC$_CATEGORYMyClass$_MyAddition, and initializes the category itself with the list generated earlier.
Finally, the compiler saves an array L_OBJC_LABELCATEGORY$with a size of 1 in the objc_catlist section under the DATA section (of course, if there are multiple categories, an array of corresponding lengths will be generated for loading run-time categories).
At this point, the compiler is nearing the end of its work. We will discuss how category is loaded at runtime in the next section.

4. Tracing Back to the Source - How to Load category

As we know, the operation of Objective-C depends on OC runtime, and OC runtime, like other system libraries, is dynamically loaded by OS X and iOS through dyld.
If you want to know more about dyld, you can move here.( 3).

For OC runtime, the entry method is as follows (in the objc-os.mm file):

void _objc_init(void)
{
    static bool initialized = false;
    if (initialized) return;
    initialized = true;

    // fixme defer initialization until an objc-using image is found?
    environ_init();
    tls_init();
    lock_init();
    exception_init();

    // Register for unmap first, in case some +load unmaps something
    _dyld_register_func_for_remove_image(&unmap_image);
    dyld_register_image_state_change_handler(dyld_image_state_bound,
                                             1/*batch*/, &map_images);
    dyld_register_image_state_change_handler(dyld_image_state_dependents_initialized, 0/*not batch*/, &load_images);
}

category is appended to the class when map_images occur. Under the standard of new-ABI, map_images called in _objc_init will eventually call the _read_images method in objc-runtime-new.mm. At the end of the _read_images method, there are the following code fragments:

// Discover categories. 
    for (EACH_HEADER) {
        category_t **catlist =
            _getObjc2CategoryList(hi, &count);
        for (i = 0; i < count; i++) {
            category_t *cat = catlist[i];
            class_t *cls = remapClass(cat->cls);

            if (!cls) {
                // Category's target class is missing (probably weak-linked).
                // Disavow any knowledge of this category.
                catlist[i] = NULL;
                if (PrintConnecting) {
                    _objc_inform("CLASS: IGNORING category \?\?\?(%s) %p with "
                                 "missing weak-linked target class",
                                 cat->name, cat);
                }
                continue;
            }

            // Process this category. 
            // First, register the category with its target class. 
            // Then, rebuild the class's method lists (etc) if 
            // the class is realized. 
            BOOL classExists = NO;
            if (cat->instanceMethods ||  cat->protocols 
                ||  cat->instanceProperties)
            {
                addUnattachedCategoryForClass(cat, cls, hi);
                if (isRealized(cls)) {
                    remethodizeClass(cls);
                    classExists = YES;
                }
                if (PrintConnecting) {
                    _objc_inform("CLASS: found category -%s(%s) %s",
                                 getName(cls), cat->name,
                                 classExists ? "on existing class" : "");
                }
            }

            if (cat->classMethods  ||  cat->protocols 
                /* ||  cat->classProperties */)
            {
                addUnattachedCategoryForClass(cat, cls->isa, hi);
                if (isRealized(cls->isa)) {
                    remethodizeClass(cls->isa);
                }
                if (PrintConnecting) {
                    _objc_inform("CLASS: found category +%s(%s)",
                                 getName(cls), cat->name);
                }
            }
        }
    }

First of all, the catlist we get is the category_t array that the compiler prepared for us in the previous section. We will not list how to load the catlist itself for the time being. This has little to do with the category itself. Interested students can study the following Apple binary format and load mechanism.
By omitting Print Connecting, which is used for log ging, this code is easy to understand:
1) Add instance methods, protocols and attributes of category to classes
2) Adding category's class methods and protocols to the metaclass of the class

It's worth noting that there's a little comment in the code /| cat - > classProperties /. It seems that Apple has plans to add attributes to classes.
ok, let's go inside and see how the various lists of category are finally added to the class, let's take the example method list for example.
In the code snippet above, addUnattached category ForClass only maps classes and categories to make an association, and remethodizeClass is the real contributor to additions.

static void remethodizeClass(class_t *cls)
{
    category_list *cats;
    BOOL isMeta;

    rwlock_assert_writing(&runtimeLock);

    isMeta = isMetaClass(cls);

    // Re-methodizing: check for more categories
    if ((cats = unattachedCategoriesForClass(cls))) {
        chained_property_list *newproperties;
        const protocol_list_t **newprotos;

        if (PrintConnecting) {
            _objc_inform("CLASS: attaching categories to class '%s' %s",
                         getName(cls), isMeta ? "(meta)" : "");
        }

        // Update methods, properties, protocols

        BOOL vtableAffected = NO;
        attachCategoryMethods(cls, cats, &vtableAffected);

        newproperties = buildPropertyList(NULL, cats, isMeta);
        if (newproperties) {
            newproperties->next = cls->data()->properties;
            cls->data()->properties = newproperties;
        }

        newprotos = buildProtocolList(cats, NULL, cls->data()->protocols);
        if (cls->data()->protocols  &&  cls->data()->protocols != newprotos) {
            _free_internal(cls->data()->protocols);
        }
        cls->data()->protocols = newprotos;

        _free_internal(cats);

        // Update method caches and vtables
        flushCaches(cls);
        if (vtableAffected) flushVtables(cls);
    }
}

For instance methods that add classes, we call attachCategory Methods. Let's look at attachCategory Methods:

static void 
attachCategoryMethods(class_t *cls, category_list *cats,
                      BOOL *inoutVtablesAffected)
{
    if (!cats) return;
    if (PrintReplacedMethods) printReplacements(cls, cats);

    BOOL isMeta = isMetaClass(cls);
    method_list_t **mlists = (method_list_t **)
        _malloc_internal(cats->count * sizeof(*mlists));

    // Count backwards through cats to get newest categories first
    int mcount = 0;
    int i = cats->count;
    BOOL fromBundle = NO;
    while (i--) {
        method_list_t *mlist = cat_method_list(cats->list[i].cat, isMeta);
        if (mlist) {
            mlists[mcount++] = mlist;
            fromBundle |= cats->list[i].fromBundle;
        }
    }

    attachMethodLists(cls, mlists, mcount, NO, fromBundle, inoutVtablesAffected);

    _free_internal(mlists);

}

AttachCategory Methods does a relatively simple job. It simply spells out the list of instance methods for all categories into a large list of instance methods and passes it on to the attachMethodLists method (which, I swear, is the last piece of code we've seen in this section). This method is a bit long. We only look at a small section:

for (uint32_t m = 0;
             (scanForCustomRR || scanForCustomAWZ)  &&  m < mlist->count;
             m++)
        {
            SEL sel = method_list_nth(mlist, m)->name;
            if (scanForCustomRR  &&  isRRSelector(sel)) {
                cls->setHasCustomRR();
                scanForCustomRR = false;
            } else if (scanForCustomAWZ  &&  isAWZSelector(sel)) {
                cls->setHasCustomAWZ();
                scanForCustomAWZ = false;
            }
        }

        // Fill method list array
        newLists[newCount++] = mlist;
    .
    .
    .

    // Copy old methods to the method list array
    for (i = 0; i < oldCount; i++) {
        newLists[newCount++] = oldLists[i];
    }

Two points need to be noted:
1) The category method does not "completely replace" the existing methods of the original class. That is to say, if the category and the original class have methodA, then after the category is added, there will be two methodA in the method list of the class.
2) Category's method is put in front of the new method list, and the method of the original class is put behind the new method list. This is what we usually call category's method, which "overrides" the method of the original class's same name. This is because the method is searched in the order of the method list at runtime. It only needs to find the method of the corresponding name. We will stop ________________

5. Side branch end-leaf-category and + load method

We know that there are + load methods in both classes and categories, so there are two problems:
1) When class + load method is called, can we call the method declared in category?
2) What is the order of invocation of these + load methods?
Given the amount of code we've looked at in the previous sections, let's first look at two intuitive questions:

Our code contains two categories of MyClass and MyClass (category 1 and category 2), MyClass and two categories have added the + load method, and both category 1 and category 2 have written MyClass's printName method.
Click Edit Scheme in Xcode to add the following two environment variables (you can print log information when you execute the load method and load category, see objc-private.h for more environment variable options):

Running the project, we will see the console print out a lot of things, we only find the information we want, the order is as follows:

objc[1187]: REPLACED: -[MyClass printName] by category Category1
objc[1187]: REPLACED: -[MyClass printName] by category Category2
.
.
.
objc[1187]: LOAD: class 'MyClass' scheduled for +load
objc[1187]: LOAD: category 'MyClass(Category1)' scheduled for +load
objc[1187]: LOAD: category 'MyClass(Category2)' scheduled for +load
objc[1187]: LOAD: +[MyClass load]
.
.
.
objc[1187]: LOAD: +[MyClass(Category1) load]
.
.
.
objc[1187]: LOAD: +[MyClass(Category2) load]

So the answer to the above two questions is obvious:
1) Callable because attaching categories to classes precedes execution of the + load method
2) The execution order of + load is first class, then category, while the execution order of + load of category is determined by compiling order.
The current compilation order is as follows:

We adjust the compilation order of a Category 1 and Category 2, run. ok, we can see that the output order of the console has changed:

objc[1187]: REPLACED: -[MyClass printName] by category Category2
objc[1187]: REPLACED: -[MyClass printName] by category Category1
.
.
.
objc[1187]: LOAD: class 'MyClass' scheduled for +load
objc[1187]: LOAD: category 'MyClass(Category2)' scheduled for +load
objc[1187]: LOAD: category 'MyClass(Category1)' scheduled for +load
objc[1187]: LOAD: +[MyClass load]
.
.
.
objc[1187]: LOAD: +[MyClass(Category2) load]
.
.
.
objc[1187]: LOAD: +[MyClass(Category1) load]

Although this is the order of execution for + load, for "override" dropped methods, the corresponding method in the last compiled category will be found first.
In this section, we only get the answer in a very intuitive way. Interested students can continue to study OC's runtime code.

6. Analog bypass-category and method coverage

Since we have already explained the principles in the preceding sections, there is only one problem in this section:
How to call a method overwritten by category in the original class?
For this problem, we already know that category does not replace the same name method of the original class completely, but category is just in front of the method list, so we can call the method of the original class as long as we find the last method with the corresponding name along the method list.

Class currentClass = [MyClass class];
MyClass *my = [[MyClass alloc] init];

if (currentClass) {
    unsigned int methodCount;
    Method *methodList = class_copyMethodList(currentClass, &methodCount);
    IMP lastImp = NULL;
    SEL lastSel = NULL;
    for (NSInteger i = 0; i < methodCount; i++) {
        Method method = methodList[i];
        NSString *methodName = [NSString stringWithCString:sel_getName(method_getName(method)) 
                                        encoding:NSUTF8StringEncoding];
        if ([@"printName" isEqualToString:methodName]) {
            lastImp = method_getImplementation(method);
            lastSel = method_getName(method);
        }
    }
    typedef void (*fn)(id,SEL);

    if (lastImp != NULL) {
        fn f = (fn)lastImp;
        f(my,lastSel);
    }
    free(methodList);
}

7. A further layer - category and associated objects

As you can see above, we know that there is no way to add instance variables to category. But we often need to add the value associated with the object in the category, at this time we can resort to the associated object to achieve.

MyClass+Category1.h:

#import "MyClass.h"

@interface MyClass (Category1)

@property(nonatomic,copy) NSString *name;

@end

MyClass+Category1.m:

#import "MyClass+Category1.h"
#import <objc/runtime.h>

@implementation MyClass (Category1)

+ (void)load
{
    NSLog(@"%@",@"load in Category1");
}

- (void)setName:(NSString *)name
{
    objc_setAssociatedObject(self,
                             "name",
                             name,
                             OBJC_ASSOCIATION_COPY);
}

- (NSString*)name
{
    NSString *nameObject = objc_getAssociatedObject(self, "name");
    return nameObject;
}

@end

But where does the associated object exist? How to store? How to deal with related objects when objects are destroyed?
Let's go through the source code of runtime and have a method _object_set_associative_reference in the objc-references.mm file:

void _object_set_associative_reference(id object, void *key, id value, uintptr_t policy) {
    // retain the new value (if any) outside the lock.
    ObjcAssociation old_association(0, nil);
    id new_value = value ? acquireValue(value, policy) : nil;
    {
        AssociationsManager manager;
        AssociationsHashMap &associations(manager.associations());
        disguised_ptr_t disguised_object = DISGUISE(object);
        if (new_value) {
            // break any existing association.
            AssociationsHashMap::iterator i = associations.find(disguised_object);
            if (i != associations.end()) {
                // secondary table exists
                ObjectAssociationMap *refs = i->second;
                ObjectAssociationMap::iterator j = refs->find(key);
                if (j != refs->end()) {
                    old_association = j->second;
                    j->second = ObjcAssociation(policy, new_value);
                } else {
                    (*refs)[key] = ObjcAssociation(policy, new_value);
                }
            } else {
                // create the new association (first time).
                ObjectAssociationMap *refs = new ObjectAssociationMap;
                associations[disguised_object] = refs;
                (*refs)[key] = ObjcAssociation(policy, new_value);
                _class_setInstancesHaveAssociatedObjects(_object_getClass(object));
            }
        } else {
            // setting the association to nil breaks the association.
            AssociationsHashMap::iterator i = associations.find(disguised_object);
            if (i !=  associations.end()) {
                ObjectAssociationMap *refs = i->second;
                ObjectAssociationMap::iterator j = refs->find(key);
                if (j != refs->end()) {
                    old_association = j->second;
                    refs->erase(j);
                }
            }
        }
    }
    // release the old value (outside of the lock).
    if (old_association.hasValue()) ReleaseValue()(old_association);
}

We can see that all associated objects are managed by Associations Manager, which is defined as follows:

class AssociationsManager {
    static OSSpinLock _lock;
    static AssociationsHashMap *_map;               // associative references:  object pointer -> PtrPtrHashMap.
public:
    AssociationsManager()   { OSSpinLockLock(&_lock); }
    ~AssociationsManager()  { OSSpinLockUnlock(&_lock); }

    AssociationsHashMap &associations() {
        if (_map == NULL)
            _map = new AssociationsHashMap();
        return *_map;
    }
};

In Associations Manager, a static Associations HashMap stores all associated objects. This is equivalent to having all the associated objects in a global map. The key of the map is the pointer address of the object (the pointer address of any two different objects must be different), and the value of the map is another Association HashMap, which saves the kv pairs of the related objects.
In the object destruction logic, see objc-runtime-new.mm:

void *objc_destructInstance(id obj) 
{
    if (obj) {
        Class isa_gen = _object_getClass(obj);
        class_t *isa = newcls(isa_gen);

        // Read all of the flags at once for performance.
        bool cxx = hasCxxStructors(isa);
        bool assoc = !UseGC && _class_instancesHaveAssociatedObjects(isa_gen);

        // This order is important.
        if (cxx) object_cxxDestruct(obj);
        if (assoc) _object_remove_assocations(obj);

        if (!UseGC) objc_clear_deallocating(obj);
    }

    return obj;
}

Well, runtime's destructive object function objc_destructInstance will determine whether the object has an associated object or not, and if so, it will call _object_remove_associations to clean up the associated objects.

Epilogue

As Mr. Hou Jie said - "There is no secret in front of the source code." Although Apple's Cocoa Touch framework is not open source, the runtime and Core Foundation of Objective-C are completely open source (in http://www.opensource.apple.com/tarballs/can be downloaded to the full open source code.
This series of runtime source learning will continue to be updated, and students who still have no idea can download the source learning on the above website by themselves. If there are any mistakes, please correct them.

Topics: iOS Attribute Javascript OS X

Programmer Think