Implementation of Android function extraction shell

Posted by kjharve on Sat, 15 Jan 2022 15:37:12 +0100

0x0 Preface

I don't know where the word function extraction shell originated, but I understand that function extraction shell is a shell that gives the function code in the dex file to nop, and then fills the bytecode back to dex at runtime.

Before function extraction:

After function extraction:

I wanted to write this kind of shell long ago. Recently, I finally made it and named it dpt. Now share the code. Welcome to play. Project address: https://github.com/luoyesiqiu/dpt-shell

0x1 project structure

The other part is the proccess shell, which is divided into two parts.

proccessor is a module that can process ordinary apk into shell apk. Its main functions include:

  • Unzip apk

  • Extract the codeitem of dex in apk and save it

  • Modify androidmanifest Application class name in XML

  • Generate a new apk

The process is as follows:

The dex file and so file finally generated by the shell module will be integrated into the apk to be shelled. Its main functions are:

  • Process App startup

  • Replace dexElements

  • hook correlation function

  • Call target Application

  • codeitem file reading

  • codeitem filling

The process is as follows:

0x2 proccessor

Two important logical points of proccessor are androidmaniest XML processing and Codeitem extraction

(1) Handle androidmanifest xml

We deal with androidmanifest The operation of xml is mainly to back up the class name of the original application and the class name of the proxy application written to the shell. The purpose of backing up the original Application class name is to call the Application of our original APK after the execution of the shell process. The purpose of writing the proxy application class name of the shell is to start our proxy application as soon as possible when the app is started, so that we can do some preparatory work, such as custom loading DEX and hook functions. We know, androidmanifest After xml generates APK, it is not stored in the format of ordinary xml file, but in axml format. Fortunately, however, many bigwigs have written libraries for parsing and editing axml, which we can use directly. The axml processing library used here is ManifestEditor.

Extract the original androidmanifest The complete class name code of XML application is as follows. Just call getApplicationName function directly

    public static String getValue(String file,String tag,String ns,String attrName){
        byte[] axmlData = IoUtils.readFile(file);
        AxmlParser axmlParser = new AxmlParser(axmlData);
        try {
            while (axmlParser.next() != AxmlParser.END_FILE) {
                if (axmlParser.getAttrCount() != 0 && !axmlParser.getName().equals(tag)) {
                    continue;
                }
                for (int i = 0; i < axmlParser.getAttrCount(); i++) {
                    if (axmlParser.getNamespacePrefix().equals(ns) && axmlParser.getAttrName(i).equals(attrName)) {
                        return (String) axmlParser.getAttrValue(i);
                    }
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        return null;
    }

    public static String getApplicationName(String file) {
        return getValue(file,"application","android","name");
    }

The code for writing the Application class name is as follows:

    public static void writeApplicationName(String inManifestFile, String outManifestFile, String newApplicationName){
        ModificationProperty property = new ModificationProperty();
        property.addApplicationAttribute(new AttributeItem(NodeValue.Application.NAME,newApplicationName));

        FileProcesser.processManifestFile(inManifestFile, outManifestFile, property);

    }

(2) Extract CodeItem

CodeItem is a structure that stores data related to function bytecode in dex file. The following figure shows the general appearance of CodeItem.

It is said to extract CodeItem. In fact, we extract insns in CodeItem, which stores the real bytecode of the function. To extract insns, we use the in the Android source code dx Tool, dx tool can easily read all parts of dex file.

The following code traverses all classdefs and all functions in them, and then calls extractMethod to process a single function.

    public static List<Instruction> extractAllMethods(File dexFile, File outDexFile) {
        List<Instruction> instructionList = new ArrayList<>();
        Dex dex = null;
        RandomAccessFile randomAccessFile = null;
        byte[] dexData = IoUtils.readFile(dexFile.getAbsolutePath());
        IoUtils.writeFile(outDexFile.getAbsolutePath(),dexData);

        try {
            dex = new Dex(dexFile);
            randomAccessFile = new RandomAccessFile(outDexFile, "rw");
            Iterable<ClassDef> classDefs = dex.classDefs();
            for (ClassDef classDef : classDefs) {
                
                ......
                
                if(classDef.getClassDataOffset() == 0){
                    String log = String.format("class '%s' data offset is zero",classDef.toString());
                    logger.warn(log);
                    continue;
                }

                ClassData classData = dex.readClassData(classDef);
                ClassData.Method[] directMethods = classData.getDirectMethods();
                ClassData.Method[] virtualMethods = classData.getVirtualMethods();
                for (ClassData.Method method : directMethods) {
                    Instruction instruction = extractMethod(dex,randomAccessFile,classDef,method);
                    if(instruction != null) {
                        instructionList.add(instruction);
                    }
                }

                for (ClassData.Method method : virtualMethods) {
                    Instruction instruction = extractMethod(dex, randomAccessFile,classDef, method);
                    if(instruction != null) {
                        instructionList.add(instruction);
                    }
                }
            }
        }
        catch (Exception e){
            e.printStackTrace();
        }
        finally {
            IoUtils.close(randomAccessFile);
        }

        return instructionList;
    }

If it is found that there is no code (usually native function) or the capacity of insns is not enough to fill the return statement during the process of processing the function, skip the processing. Here is the extraction operation of the corresponding function extraction shell

    private static Instruction extractMethod(Dex dex ,RandomAccessFile outRandomAccessFile,ClassDef classDef,ClassData.Method method)
            throws Exception{
        String returnTypeName = dex.typeNames().get(dex.protoIds().get(dex.methodIds().get(method.getMethodIndex()).getProtoIndex()).getReturnTypeIndex());
        String methodName = dex.strings().get(dex.methodIds().get(method.getMethodIndex()).getNameIndex());
        String className = dex.typeNames().get(classDef.getTypeIndex());
        //native function
        if(method.getCodeOffset() == 0){
            String log = String.format("method code offset is zero,name =  %s.%s , returnType = %s",
                    TypeUtils.getHumanizeTypeName(className),
                    methodName,
                    TypeUtils.getHumanizeTypeName(returnTypeName));
            logger.warn(log);
            return null;
        }
        Instruction instruction = new Instruction();
        //16 = registers_size + ins_size + outs_size + tries_size + debug_info_off + insns_size
        int insnsOffset = method.getCodeOffset() + 16;
        Code code = dex.readCode(method);
        //fault tolerant 
        if(code.getInstructions().length == 0){
            String log = String.format("method has no code,name =  %s.%s , returnType = %s",
                    TypeUtils.getHumanizeTypeName(className),
                    methodName,
                    TypeUtils.getHumanizeTypeName(returnTypeName));
            logger.warn(log);
            return null;
        }
        int insnsCapacity = code.getInstructions().length;
        //The insns capacity is insufficient to store the return statement. Skip
        byte[] returnByteCodes = getReturnByteCodes(returnTypeName);
        if(insnsCapacity * 2 < returnByteCodes.length){
            logger.warn("The capacity of insns is not enough to store the return statement. {}.{}() -> {} insnsCapacity = {}byte(s),returnByteCodes = {}byte(s)",
                    TypeUtils.getHumanizeTypeName(className),
                    methodName,
                    TypeUtils.getHumanizeTypeName(returnTypeName),
                    insnsCapacity * 2,
                    returnByteCodes.length);

            return null;
        }
        instruction.setOffsetOfDex(insnsOffset);
        //MethodIndex here corresponds to method_ Index of IDS zone
        instruction.setMethodIndex(method.getMethodIndex());
        //Note: here is the size of the array
        instruction.setInstructionDataSize(insnsCapacity * 2);
        byte[] byteCode = new byte[insnsCapacity * 2];
        //Write nop instruction
        for (int i = 0; i < insnsCapacity; i++) {
            outRandomAccessFile.seek(insnsOffset + (i * 2));
            byteCode[i * 2] = outRandomAccessFile.readByte();
            byteCode[i * 2 + 1] = outRandomAccessFile.readByte();
            outRandomAccessFile.seek(insnsOffset + (i * 2));
            outRandomAccessFile.writeShort(0);
        }
        instruction.setInstructionsData(byteCode);
        outRandomAccessFile.seek(insnsOffset);
        //Write a return statement
        outRandomAccessFile.write(returnByteCodes);

        return instruction;
    }

0x3 shell module

Shell module is the main logic of function extraction shell, and its functions have been described above.

(1) Hook function

The hook function time should be earlier. dpt is in_ The init function starts a series of hooks

extern "C" void _init(void) {
    dpt_hook();
}

Hook framework uses Dobby There are two main Hook functions: MapFileAtAddress and LoadMethod.

The Hook MapFileAtAddress function is used to modify the properties of dex when we load dex so that the loaded dex can be written, so that we can fill in the bytecode back to dex. A big man has analyzed it in detail. For details, please refer to This article.

void* MapFileAtAddressAddr = DobbySymbolResolver(GetArtLibPath(),MapFileAtAddress_Sym());
DobbyHook(MapFileAtAddressAddr, (void *) MapFileAtAddress28,(void **) &g_originMapFileAtAddress28);

When the Hook arrives, append prot to the prot parameter_ Write attribute

void* MapFileAtAddress28(uint8_t* expected_ptr,
              size_t byte_count,
              int prot,
              int flags,
              int fd,
              off_t start,
              bool low_4gb,
              bool reuse,
              const char* filename,
              std::string* error_msg){
    int new_prot = (prot | PROT_WRITE);
    if(nullptr != g_originMapFileAtAddress28) {
        return g_originMapFileAtAddress28(expected_ptr,byte_count,new_prot,flags,fd,start,low_4gb,reuse,filename,error_msg);
    }
}

Before the Hook LoadMethod function, we need to understand the LoadMethod function process. Why is this LoadMethod function? Are other functions feasible?

When a class is loaded, its call chain is as follows (some processes have been omitted):

ClassLoader.java::loadClass -> DexPathList.java::findClass -> DexFile.java::defineClass -> class_linker.cc::LoadClass -> class_linker.cc::LoadClassMembers -> class_linker.cc::LoadMethod

That is, when a class is loaded, it will call the LoadMethod function. Let's take a look at its function prototype:

void ClassLinker::LoadMethod(const DexFile& dex_file,
                             const ClassDataItemIterator& it,
                             Handle<mirror::Class> klass,
                             ArtMethod* dst);

This function is too explosive. It has two explosive parameters, DexFile and ClassDataItemIterator. From this function, we can get the DexFile structure of the currently loaded function and some information of the current function. You can see the ClassDataItemIterator structure:

  class ClassDataItemIterator{
  
  ......
  
  // A decoded version of the method of a class_data_item
  struct ClassDataMethod {
    uint32_t method_idx_delta_;  // delta of index into the method_ids array for MethodId
    uint32_t access_flags_;
    uint32_t code_off_;
    ClassDataMethod() : method_idx_delta_(0), access_flags_(0), code_off_(0) {}

   private:
    DISALLOW_COPY_AND_ASSIGN(ClassDataMethod);
  };
  ClassDataMethod method_;

  // Read and decode a method from a class_data_item stream into method
  void ReadClassDataMethod();

  const DexFile& dex_file_;
  size_t pos_;  // integral number of items passed
  const uint8_t* ptr_pos_;  // pointer into stream of class_data_item
  uint32_t last_idx_;  // last read field or method index to apply delta to
  DISALLOW_IMPLICIT_CONSTRUCTORS(ClassDataItemIterator);
};

The most important field is code_off_ Its value is the offset of the CodeItem of the currently loaded function relative to the DexFile. When the corresponding function is loaded, we can directly access its CodeItem. Are other functions OK? In the above process, there is no function more suitable for our Hook than LoadMethod, so it is the best Hook point.

Hook LoadMethod is a little more complex. It's not that the hook code is complex, but that the code processed after hook triggering is complex. We need to adapt to multiple Android versions. The parameters of LoadMethod function in each version may change. Fortunately, the LoadMethod does not change very much. So, how do we read the code in the ClassDataItemIterator class_ off_ And? A more direct approach is to calculate the offset and then maintain an offset in the code. However, this approach is not easy to read and easy to make mistakes. The dpt method is to copy the ClassDataItemIterator class, and then directly convert the ClassDataItemIterator reference to our custom ClassDataItemIterator reference, so that we can easily read the value of the field.

The following is the operation done after the LoadMethod is called. The logic is to read the insns in the map, and then fill them back to the specified location.

void LoadMethod(void *thiz, void *self, const void *dex_file, const void *it, const void *method,
                void *klass, void *dst) {

    if (g_originLoadMethod25 != nullptr
        || g_originLoadMethod28 != nullptr
        || g_originLoadMethod29 != nullptr) {
        uint32_t location_offset = getDexFileLocationOffset();
        uint32_t begin_offset = getDataItemCodeItemOffset();
        callOriginLoadMethod(thiz, self, dex_file, it, method, klass, dst);

        ClassDataItemReader *classDataItemReader = getClassDataItemReader(it,method);


        uint8_t **begin_ptr = (uint8_t **) ((uint8_t *) dex_file + begin_offset);
        uint8_t *begin = *begin_ptr;
        // vtable(4|8) + prev_fields_size
        std::string *location = (reinterpret_cast<std::string *>((uint8_t *) dex_file +
                                                                 location_offset));
        if (location->find("base.apk") != std::string::npos) {

            //code_item_offset == 0 indicates that it is a native method or there is no code
            if (classDataItemReader->GetMethodCodeItemOffset() == 0) {
                DLOGW("native method? = %s code_item_offset = 0x%x",
                      classDataItemReader->MemberIsNative() ? "true" : "false",
                      classDataItemReader->GetMethodCodeItemOffset());
                return;
            }

            uint16_t firstDvmCode = *((uint16_t*)(begin + classDataItemReader->GetMethodCodeItemOffset() + 16));
            if(firstDvmCode != 0x0012 && firstDvmCode != 0x0016 && firstDvmCode != 0x000e){
                NLOG("this method has code no need to patch");
                return;
            }

            uint32_t dexSize = *((uint32_t*)(begin + 0x20));

            int dexIndex = dexNumber(location);
            auto dexIt = dexMap.find(dexIndex - 1);
            if (dexIt != dexMap.end()) {

                auto dexMemIt = dexMemMap.find(dexIndex);
                if(dexMemIt == dexMemMap.end()){
                    changeDexProtect(begin,location->c_str(),dexSize,dexIndex);
                }


                auto codeItemMap = dexIt->second;
                int methodIdx = classDataItemReader->GetMemberIndex();
                auto codeItemIt = codeItemMap->find(methodIdx);

                if (codeItemIt != codeItemMap->end()) {
                    CodeItem* codeItem = codeItemIt->second;
                    uint8_t  *realCodeItemPtr = (uint8_t*)(begin +
                                                classDataItemReader->GetMethodCodeItemOffset() +
                                                16);

                    memcpy(realCodeItemPtr,codeItem->getInsns(),codeItem->getInsnsSize());
                }
            }
        }
    }
}

(2) Load dex

In fact, dex has been loaded once when the App starts, but why should we load it again? Because the dex loaded by the system is read-only, we can't modify that part of the memory. Moreover, the dex loading of App is earlier than the startup of our Application, so we can't feel it in the code, so we need to reload dex.

    private ClassLoader loadDex(Context context){
        String sourcePath = context.getApplicationInfo().sourceDir;
        String nativePath = context.getApplicationInfo().nativeLibraryDir;

        ShellClassLoader shellClassLoader = new ShellClassLoader(sourcePath,nativePath,ClassLoader.getSystemClassLoader());
        return shellClassLoader;
    }

Custom ClassLoader

public class ShellClassLoader extends PathClassLoader {

    private final String TAG = ShellClassLoader.class.getSimpleName();

    public ShellClassLoader(String dexPath,ClassLoader classLoader) {
        super(dexPath,classLoader);
    }

    public ShellClassLoader(String dexPath, String librarySearchPath,ClassLoader classLoader) {
        super(dexPath, librarySearchPath, classLoader);
    }
}

(3) Replace dexElements

This step is also very important. The purpose of this step is to enable ClassLoader to load classes from our newly loaded dex file. The code is as follows:

void mergeDexElements(JNIEnv* env,jclass klass,jobject oldClassLoader,jobject newClassLoader){
    jclass BaseDexClassLoaderClass = env->FindClass("dalvik/system/BaseDexClassLoader");
    jfieldID  pathList = env->GetFieldID(BaseDexClassLoaderClass,"pathList","Ldalvik/system/DexPathList;");
    jobject oldDexPathListObj = env->GetObjectField(oldClassLoader,pathList);
    if(env->ExceptionCheck() || nullptr == oldDexPathListObj ){
        env->ExceptionClear();
        DLOGW("mergeDexElements oldDexPathListObj get fail");
        return;
    }
    jobject newDexPathListObj = env->GetObjectField(newClassLoader,pathList);
    if(env->ExceptionCheck() || nullptr == newDexPathListObj){
        env->ExceptionClear();
        DLOGW("mergeDexElements newDexPathListObj get fail");
        return;
    }

    jclass DexPathListClass = env->FindClass("dalvik/system/DexPathList");
    jfieldID  dexElementField = env->GetFieldID(DexPathListClass,"dexElements","[Ldalvik/system/DexPathList$Element;");


    jobjectArray newClassLoaderDexElements = static_cast<jobjectArray>(env->GetObjectField(
            newDexPathListObj, dexElementField));
    if(env->ExceptionCheck() || nullptr == newClassLoaderDexElements){
        env->ExceptionClear();
        DLOGW("mergeDexElements new dexElements get fail");
        return;
    }

    jobjectArray oldClassLoaderDexElements = static_cast<jobjectArray>(env->GetObjectField(
            oldDexPathListObj, dexElementField));
    if(env->ExceptionCheck() || nullptr == oldClassLoaderDexElements){
        env->ExceptionClear();
        DLOGW("mergeDexElements old dexElements get fail");
        return;
    }

    jint oldLen = env->GetArrayLength(oldClassLoaderDexElements);
    jint newLen = env->GetArrayLength(newClassLoaderDexElements);

    DLOGD("mergeDexElements oldlen = %d , newlen = %d",oldLen,newLen);

    jclass ElementClass = env->FindClass("dalvik/system/DexPathList$Element");

    jobjectArray  newElementArray = env->NewObjectArray(oldLen + newLen,ElementClass, nullptr);

    for(int i = 0;i < newLen;i++) {
        jobject elementObj = env->GetObjectArrayElement(newClassLoaderDexElements, i);
        env->SetObjectArrayElement(newElementArray,i,elementObj);
    }


    for(int i = newLen;i < oldLen + newLen;i++) {
        jobject elementObj = env->GetObjectArrayElement(oldClassLoaderDexElements, i - newLen);
        env->SetObjectArrayElement(newElementArray,i,elementObj);
    }

    env->SetObjectField(oldDexPathListObj, dexElementField,newElementArray);

    DLOGD("mergeDexElements success");
}

0x4 summary

It really took a lot of time to make this shell. Only you know the detours you have taken, but it's good to make it. dpt has not undergone a large number of tests, and the problems found later will be solved slowly.

Topics: Android