0x0 Preface
I don't know where the word function extraction shell originated, but I understand that function extraction shell is a shell that gives the function code in the dex file to nop, and then fills the bytecode back to dex at runtime.
Before function extraction:
After function extraction:
I wanted to write this kind of shell long ago. Recently, I finally made it and named it dpt. Now share the code. Welcome to play. Project address: https://github.com/luoyesiqiu/dpt-shell
0x1 project structure
The other part is the proccess shell, which is divided into two parts.
proccessor is a module that can process ordinary apk into shell apk. Its main functions include:
-
Unzip apk
-
Extract the codeitem of dex in apk and save it
-
Modify androidmanifest Application class name in XML
-
Generate a new apk
The process is as follows:
The dex file and so file finally generated by the shell module will be integrated into the apk to be shelled. Its main functions are:
-
Process App startup
-
Replace dexElements
-
hook correlation function
-
Call target Application
-
codeitem file reading
-
codeitem filling
The process is as follows:
0x2 proccessor
Two important logical points of proccessor are androidmaniest XML processing and Codeitem extraction
(1) Handle androidmanifest xml
We deal with androidmanifest The operation of xml is mainly to back up the class name of the original application and the class name of the proxy application written to the shell. The purpose of backing up the original Application class name is to call the Application of our original APK after the execution of the shell process. The purpose of writing the proxy application class name of the shell is to start our proxy application as soon as possible when the app is started, so that we can do some preparatory work, such as custom loading DEX and hook functions. We know, androidmanifest After xml generates APK, it is not stored in the format of ordinary xml file, but in axml format. Fortunately, however, many bigwigs have written libraries for parsing and editing axml, which we can use directly. The axml processing library used here is ManifestEditor.
Extract the original androidmanifest The complete class name code of XML application is as follows. Just call getApplicationName function directly
public static String getValue(String file,String tag,String ns,String attrName){ byte[] axmlData = IoUtils.readFile(file); AxmlParser axmlParser = new AxmlParser(axmlData); try { while (axmlParser.next() != AxmlParser.END_FILE) { if (axmlParser.getAttrCount() != 0 && !axmlParser.getName().equals(tag)) { continue; } for (int i = 0; i < axmlParser.getAttrCount(); i++) { if (axmlParser.getNamespacePrefix().equals(ns) && axmlParser.getAttrName(i).equals(attrName)) { return (String) axmlParser.getAttrValue(i); } } } } catch (Exception e) { e.printStackTrace(); } return null; } public static String getApplicationName(String file) { return getValue(file,"application","android","name"); }
The code for writing the Application class name is as follows:
public static void writeApplicationName(String inManifestFile, String outManifestFile, String newApplicationName){ ModificationProperty property = new ModificationProperty(); property.addApplicationAttribute(new AttributeItem(NodeValue.Application.NAME,newApplicationName)); FileProcesser.processManifestFile(inManifestFile, outManifestFile, property); }
(2) Extract CodeItem
CodeItem is a structure that stores data related to function bytecode in dex file. The following figure shows the general appearance of CodeItem.
It is said to extract CodeItem. In fact, we extract insns in CodeItem, which stores the real bytecode of the function. To extract insns, we use the in the Android source code dx Tool, dx tool can easily read all parts of dex file.
The following code traverses all classdefs and all functions in them, and then calls extractMethod to process a single function.
public static List<Instruction> extractAllMethods(File dexFile, File outDexFile) { List<Instruction> instructionList = new ArrayList<>(); Dex dex = null; RandomAccessFile randomAccessFile = null; byte[] dexData = IoUtils.readFile(dexFile.getAbsolutePath()); IoUtils.writeFile(outDexFile.getAbsolutePath(),dexData); try { dex = new Dex(dexFile); randomAccessFile = new RandomAccessFile(outDexFile, "rw"); Iterable<ClassDef> classDefs = dex.classDefs(); for (ClassDef classDef : classDefs) { ...... if(classDef.getClassDataOffset() == 0){ String log = String.format("class '%s' data offset is zero",classDef.toString()); logger.warn(log); continue; } ClassData classData = dex.readClassData(classDef); ClassData.Method[] directMethods = classData.getDirectMethods(); ClassData.Method[] virtualMethods = classData.getVirtualMethods(); for (ClassData.Method method : directMethods) { Instruction instruction = extractMethod(dex,randomAccessFile,classDef,method); if(instruction != null) { instructionList.add(instruction); } } for (ClassData.Method method : virtualMethods) { Instruction instruction = extractMethod(dex, randomAccessFile,classDef, method); if(instruction != null) { instructionList.add(instruction); } } } } catch (Exception e){ e.printStackTrace(); } finally { IoUtils.close(randomAccessFile); } return instructionList; }
If it is found that there is no code (usually native function) or the capacity of insns is not enough to fill the return statement during the process of processing the function, skip the processing. Here is the extraction operation of the corresponding function extraction shell
private static Instruction extractMethod(Dex dex ,RandomAccessFile outRandomAccessFile,ClassDef classDef,ClassData.Method method) throws Exception{ String returnTypeName = dex.typeNames().get(dex.protoIds().get(dex.methodIds().get(method.getMethodIndex()).getProtoIndex()).getReturnTypeIndex()); String methodName = dex.strings().get(dex.methodIds().get(method.getMethodIndex()).getNameIndex()); String className = dex.typeNames().get(classDef.getTypeIndex()); //native function if(method.getCodeOffset() == 0){ String log = String.format("method code offset is zero,name = %s.%s , returnType = %s", TypeUtils.getHumanizeTypeName(className), methodName, TypeUtils.getHumanizeTypeName(returnTypeName)); logger.warn(log); return null; } Instruction instruction = new Instruction(); //16 = registers_size + ins_size + outs_size + tries_size + debug_info_off + insns_size int insnsOffset = method.getCodeOffset() + 16; Code code = dex.readCode(method); //fault tolerant if(code.getInstructions().length == 0){ String log = String.format("method has no code,name = %s.%s , returnType = %s", TypeUtils.getHumanizeTypeName(className), methodName, TypeUtils.getHumanizeTypeName(returnTypeName)); logger.warn(log); return null; } int insnsCapacity = code.getInstructions().length; //The insns capacity is insufficient to store the return statement. Skip byte[] returnByteCodes = getReturnByteCodes(returnTypeName); if(insnsCapacity * 2 < returnByteCodes.length){ logger.warn("The capacity of insns is not enough to store the return statement. {}.{}() -> {} insnsCapacity = {}byte(s),returnByteCodes = {}byte(s)", TypeUtils.getHumanizeTypeName(className), methodName, TypeUtils.getHumanizeTypeName(returnTypeName), insnsCapacity * 2, returnByteCodes.length); return null; } instruction.setOffsetOfDex(insnsOffset); //MethodIndex here corresponds to method_ Index of IDS zone instruction.setMethodIndex(method.getMethodIndex()); //Note: here is the size of the array instruction.setInstructionDataSize(insnsCapacity * 2); byte[] byteCode = new byte[insnsCapacity * 2]; //Write nop instruction for (int i = 0; i < insnsCapacity; i++) { outRandomAccessFile.seek(insnsOffset + (i * 2)); byteCode[i * 2] = outRandomAccessFile.readByte(); byteCode[i * 2 + 1] = outRandomAccessFile.readByte(); outRandomAccessFile.seek(insnsOffset + (i * 2)); outRandomAccessFile.writeShort(0); } instruction.setInstructionsData(byteCode); outRandomAccessFile.seek(insnsOffset); //Write a return statement outRandomAccessFile.write(returnByteCodes); return instruction; }
0x3 shell module
Shell module is the main logic of function extraction shell, and its functions have been described above.
(1) Hook function
The hook function time should be earlier. dpt is in_ The init function starts a series of hooks
extern "C" void _init(void) { dpt_hook(); }
Hook framework uses Dobby There are two main Hook functions: MapFileAtAddress and LoadMethod.
The Hook MapFileAtAddress function is used to modify the properties of dex when we load dex so that the loaded dex can be written, so that we can fill in the bytecode back to dex. A big man has analyzed it in detail. For details, please refer to This article.
void* MapFileAtAddressAddr = DobbySymbolResolver(GetArtLibPath(),MapFileAtAddress_Sym()); DobbyHook(MapFileAtAddressAddr, (void *) MapFileAtAddress28,(void **) &g_originMapFileAtAddress28);
When the Hook arrives, append prot to the prot parameter_ Write attribute
void* MapFileAtAddress28(uint8_t* expected_ptr, size_t byte_count, int prot, int flags, int fd, off_t start, bool low_4gb, bool reuse, const char* filename, std::string* error_msg){ int new_prot = (prot | PROT_WRITE); if(nullptr != g_originMapFileAtAddress28) { return g_originMapFileAtAddress28(expected_ptr,byte_count,new_prot,flags,fd,start,low_4gb,reuse,filename,error_msg); } }
Before the Hook LoadMethod function, we need to understand the LoadMethod function process. Why is this LoadMethod function? Are other functions feasible?
When a class is loaded, its call chain is as follows (some processes have been omitted):
ClassLoader.java::loadClass -> DexPathList.java::findClass -> DexFile.java::defineClass -> class_linker.cc::LoadClass -> class_linker.cc::LoadClassMembers -> class_linker.cc::LoadMethod
That is, when a class is loaded, it will call the LoadMethod function. Let's take a look at its function prototype:
void ClassLinker::LoadMethod(const DexFile& dex_file, const ClassDataItemIterator& it, Handle<mirror::Class> klass, ArtMethod* dst);
This function is too explosive. It has two explosive parameters, DexFile and ClassDataItemIterator. From this function, we can get the DexFile structure of the currently loaded function and some information of the current function. You can see the ClassDataItemIterator structure:
class ClassDataItemIterator{ ...... // A decoded version of the method of a class_data_item struct ClassDataMethod { uint32_t method_idx_delta_; // delta of index into the method_ids array for MethodId uint32_t access_flags_; uint32_t code_off_; ClassDataMethod() : method_idx_delta_(0), access_flags_(0), code_off_(0) {} private: DISALLOW_COPY_AND_ASSIGN(ClassDataMethod); }; ClassDataMethod method_; // Read and decode a method from a class_data_item stream into method void ReadClassDataMethod(); const DexFile& dex_file_; size_t pos_; // integral number of items passed const uint8_t* ptr_pos_; // pointer into stream of class_data_item uint32_t last_idx_; // last read field or method index to apply delta to DISALLOW_IMPLICIT_CONSTRUCTORS(ClassDataItemIterator); };
The most important field is code_off_ Its value is the offset of the CodeItem of the currently loaded function relative to the DexFile. When the corresponding function is loaded, we can directly access its CodeItem. Are other functions OK? In the above process, there is no function more suitable for our Hook than LoadMethod, so it is the best Hook point.
Hook LoadMethod is a little more complex. It's not that the hook code is complex, but that the code processed after hook triggering is complex. We need to adapt to multiple Android versions. The parameters of LoadMethod function in each version may change. Fortunately, the LoadMethod does not change very much. So, how do we read the code in the ClassDataItemIterator class_ off_ And? A more direct approach is to calculate the offset and then maintain an offset in the code. However, this approach is not easy to read and easy to make mistakes. The dpt method is to copy the ClassDataItemIterator class, and then directly convert the ClassDataItemIterator reference to our custom ClassDataItemIterator reference, so that we can easily read the value of the field.
The following is the operation done after the LoadMethod is called. The logic is to read the insns in the map, and then fill them back to the specified location.
void LoadMethod(void *thiz, void *self, const void *dex_file, const void *it, const void *method, void *klass, void *dst) { if (g_originLoadMethod25 != nullptr || g_originLoadMethod28 != nullptr || g_originLoadMethod29 != nullptr) { uint32_t location_offset = getDexFileLocationOffset(); uint32_t begin_offset = getDataItemCodeItemOffset(); callOriginLoadMethod(thiz, self, dex_file, it, method, klass, dst); ClassDataItemReader *classDataItemReader = getClassDataItemReader(it,method); uint8_t **begin_ptr = (uint8_t **) ((uint8_t *) dex_file + begin_offset); uint8_t *begin = *begin_ptr; // vtable(4|8) + prev_fields_size std::string *location = (reinterpret_cast<std::string *>((uint8_t *) dex_file + location_offset)); if (location->find("base.apk") != std::string::npos) { //code_item_offset == 0 indicates that it is a native method or there is no code if (classDataItemReader->GetMethodCodeItemOffset() == 0) { DLOGW("native method? = %s code_item_offset = 0x%x", classDataItemReader->MemberIsNative() ? "true" : "false", classDataItemReader->GetMethodCodeItemOffset()); return; } uint16_t firstDvmCode = *((uint16_t*)(begin + classDataItemReader->GetMethodCodeItemOffset() + 16)); if(firstDvmCode != 0x0012 && firstDvmCode != 0x0016 && firstDvmCode != 0x000e){ NLOG("this method has code no need to patch"); return; } uint32_t dexSize = *((uint32_t*)(begin + 0x20)); int dexIndex = dexNumber(location); auto dexIt = dexMap.find(dexIndex - 1); if (dexIt != dexMap.end()) { auto dexMemIt = dexMemMap.find(dexIndex); if(dexMemIt == dexMemMap.end()){ changeDexProtect(begin,location->c_str(),dexSize,dexIndex); } auto codeItemMap = dexIt->second; int methodIdx = classDataItemReader->GetMemberIndex(); auto codeItemIt = codeItemMap->find(methodIdx); if (codeItemIt != codeItemMap->end()) { CodeItem* codeItem = codeItemIt->second; uint8_t *realCodeItemPtr = (uint8_t*)(begin + classDataItemReader->GetMethodCodeItemOffset() + 16); memcpy(realCodeItemPtr,codeItem->getInsns(),codeItem->getInsnsSize()); } } } } }
(2) Load dex
In fact, dex has been loaded once when the App starts, but why should we load it again? Because the dex loaded by the system is read-only, we can't modify that part of the memory. Moreover, the dex loading of App is earlier than the startup of our Application, so we can't feel it in the code, so we need to reload dex.
private ClassLoader loadDex(Context context){ String sourcePath = context.getApplicationInfo().sourceDir; String nativePath = context.getApplicationInfo().nativeLibraryDir; ShellClassLoader shellClassLoader = new ShellClassLoader(sourcePath,nativePath,ClassLoader.getSystemClassLoader()); return shellClassLoader; }
Custom ClassLoader
public class ShellClassLoader extends PathClassLoader { private final String TAG = ShellClassLoader.class.getSimpleName(); public ShellClassLoader(String dexPath,ClassLoader classLoader) { super(dexPath,classLoader); } public ShellClassLoader(String dexPath, String librarySearchPath,ClassLoader classLoader) { super(dexPath, librarySearchPath, classLoader); } }
(3) Replace dexElements
This step is also very important. The purpose of this step is to enable ClassLoader to load classes from our newly loaded dex file. The code is as follows:
void mergeDexElements(JNIEnv* env,jclass klass,jobject oldClassLoader,jobject newClassLoader){ jclass BaseDexClassLoaderClass = env->FindClass("dalvik/system/BaseDexClassLoader"); jfieldID pathList = env->GetFieldID(BaseDexClassLoaderClass,"pathList","Ldalvik/system/DexPathList;"); jobject oldDexPathListObj = env->GetObjectField(oldClassLoader,pathList); if(env->ExceptionCheck() || nullptr == oldDexPathListObj ){ env->ExceptionClear(); DLOGW("mergeDexElements oldDexPathListObj get fail"); return; } jobject newDexPathListObj = env->GetObjectField(newClassLoader,pathList); if(env->ExceptionCheck() || nullptr == newDexPathListObj){ env->ExceptionClear(); DLOGW("mergeDexElements newDexPathListObj get fail"); return; } jclass DexPathListClass = env->FindClass("dalvik/system/DexPathList"); jfieldID dexElementField = env->GetFieldID(DexPathListClass,"dexElements","[Ldalvik/system/DexPathList$Element;"); jobjectArray newClassLoaderDexElements = static_cast<jobjectArray>(env->GetObjectField( newDexPathListObj, dexElementField)); if(env->ExceptionCheck() || nullptr == newClassLoaderDexElements){ env->ExceptionClear(); DLOGW("mergeDexElements new dexElements get fail"); return; } jobjectArray oldClassLoaderDexElements = static_cast<jobjectArray>(env->GetObjectField( oldDexPathListObj, dexElementField)); if(env->ExceptionCheck() || nullptr == oldClassLoaderDexElements){ env->ExceptionClear(); DLOGW("mergeDexElements old dexElements get fail"); return; } jint oldLen = env->GetArrayLength(oldClassLoaderDexElements); jint newLen = env->GetArrayLength(newClassLoaderDexElements); DLOGD("mergeDexElements oldlen = %d , newlen = %d",oldLen,newLen); jclass ElementClass = env->FindClass("dalvik/system/DexPathList$Element"); jobjectArray newElementArray = env->NewObjectArray(oldLen + newLen,ElementClass, nullptr); for(int i = 0;i < newLen;i++) { jobject elementObj = env->GetObjectArrayElement(newClassLoaderDexElements, i); env->SetObjectArrayElement(newElementArray,i,elementObj); } for(int i = newLen;i < oldLen + newLen;i++) { jobject elementObj = env->GetObjectArrayElement(oldClassLoaderDexElements, i - newLen); env->SetObjectArrayElement(newElementArray,i,elementObj); } env->SetObjectField(oldDexPathListObj, dexElementField,newElementArray); DLOGD("mergeDexElements success"); }
0x4 summary
It really took a lot of time to make this shell. Only you know the detours you have taken, but it's good to make it. dpt has not undergone a large number of tests, and the problems found later will be solved slowly.