How WatchDog works

Posted by getgray on Fri, 07 Jan 2022 08:46:14 +0100

1, Overview

In the Android system, a hardware WatchDog is used to regularly detect whether key hardware works normally. Similarly, in frame There is a software WatchDog in the work layer, which is used to regularly detect whether life and death lock events occur in key system services. The WatchDog function mainly analyzes whether the core services and important threads of the system are in the Blocked state.

  • Monitor reboot broadcast;
  • Monitor whether the key system services of mMonitors are deadlocked.

2, WatchDog initialization

2.1 startOtherServices

[-> SystemServer.java]

private void startOtherServices() {
    ...
    //Create watchdog [see Section 2.2]
    final Watchdog watchdog = Watchdog.getInstance();
    //Register reboot broadcast [see Section 2.3]
    watchdog.init(context, mActivityManagerService);
    ...
    mSystemServiceManager.startBootPhase(SystemService.PHASE_LOCK_SETTINGS_READY); //480
    ...
    mActivityManagerService.systemReady(new Runnable() {
 
       public void run() {
           mSystemServiceManager.startBootPhase(
                   SystemService.PHASE_ACTIVITY_MANAGER_READY);
           ...
           // watchdog start [see Section 3.1]
           Watchdog.getInstance().start();
           mSystemServiceManager.startBootPhase(
                   SystemService.PHASE_THIRD_PARTY_APPS_CAN_START);
        }
        
    }
}

system_ During the startup of the server process, the WatchDog is initialized, mainly including:

  • Create a watchdog object, which itself inherits from Thread;
  • Register reboot broadcast;
  • Call start() to start working.

2.2 getInstance

[-> Watchdog.java]

public static Watchdog getInstance() {
    if (sWatchdog == null) {
        //In singleton mode, create an instance object [see Section 2.3]
        sWatchdog = new Watchdog();
    }
    return sWatchdog;

2.3 creating Watchdog

[-> Watchdog.java]

public class Watchdog extends Thread {
    //List of all HandlerChecker objects, HandlerChecker object type [see section 2.3.1]
    final ArrayList<HandlerChecker> mHandlerCheckers = new ArrayList<>();
    ...
 
    private Watchdog() {
        super("watchdog");
        //Queue foreground threads
        mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
                "foreground thread", DEFAULT_TIMEOUT);
        mHandlerCheckers.add(mMonitorChecker);
        //Queue the main thread
        mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
                "main thread", DEFAULT_TIMEOUT));
        //Queue ui threads
        mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
                "ui thread", DEFAULT_TIMEOUT));
        //Queue i/o threads
        mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
                "i/o thread", DEFAULT_TIMEOUT));
        //Add the display thread to the queue
        mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
                "display thread", DEFAULT_TIMEOUT));
        //[see subsection 2.3.2]
        addMonitor(new BinderThreadMonitor());
    }
}

Watchdog inherits from Thread, and the created Thread is named "watchdog". The mHandlerCheckers queue includes HandlerChecker objects for, main Thread, FG, UI, IO, and display threads.

2.3.1 HandlerChecker

[-> Watchdog.java]

public final class HandlerChecker implements Runnable {
    private final Handler mHandler; //Handler object
    private final String mName; //Thread description name
    private final long mWaitMax; //Maximum waiting time
    //Recording monitored services
    private final ArrayList<Monitor> mMonitors = new ArrayList<Monitor>();
    private boolean mCompleted; //Set to false when starting the check
    private Monitor mCurrentMonitor; 
    private long mStartTime; //Point in time to start preparing for inspection
 
    HandlerChecker(Handler handler, String name, long waitMaxMillis) {
        mHandler = handler;
        mName = name;
        mWaitMax = waitMaxMillis; 
        mCompleted = true;
    }
}

2.3.2 addMonitor

public class Watchdog extends Thread {
    public void addMonitor(Monitor monitor) {
        synchronized (this) {
            ...
            //Here, the data type of mMonitorChecker is HandlerChecker
            mMonitorChecker.addMonitor(monitor);
        }
    }
 
    public final class HandlerChecker implements Runnable {
        private final ArrayList<Monitor> mMonitors = new ArrayList<Monitor>();
 
        public void addMonitor(Monitor monitor) {
            //Add the BinderThreadMonitor above to the mMonitors queue
            mMonitors.add(monitor);
        }
        ...
    }
}

Monitor the Binder thread and add monitor to the list of member variable mmanitors of HandlerChecker. Here is to add the BinderThreadMonitor object to the thread.

private static final class BinderThreadMonitor implements Watchdog.Monitor {
    public void monitor() {
        Binder.blockUntilThreadAvailable();
    }
}

blockUntilThreadAvailable finally calls IPCThreadState and waits for an idle binder thread

void IPCThreadState::blockUntilThreadAvailable()
{
    pthread_mutex_lock(&mProcess->mThreadCountLock);
    while (mProcess->mExecutingThreadsCount >= mProcess->mMaxThreads) {
        //The number of binder threads waiting for execution is less than the maximum number of binder threads of the process (16)
        pthread_cond_wait(&mProcess->mThreadCountDecrement, &mProcess->mThreadCountLock);
    }
    pthread_mutex_unlock(&mProcess->mThreadCountLock);
}

It can be seen that addMonitor(new BinderThreadMonitor()) adds Binder threads to Android The handler(mMonitorChecker) of the FG thread to check whether it works normally.

2.3 init

[-> Watchdog.java]

public void init(Context context, ActivityManagerService activity) {
    mResolver = context.getContentResolver();
    mActivity = activity;
    //Register the reboot broadcast receiver [see section 2.3.1]
    context.registerReceiver(new RebootRequestReceiver(),
            new IntentFilter(Intent.ACTION_REBOOT),
            android.Manifest.permission.REBOOT, null);
}

2.3.1 RebootRequestReceiver

[-> Watchdog.java]

final class RebootRequestReceiver extends BroadcastReceiver {
    @Override
    public void onReceive(Context c, Intent intent) {
        if (intent.getIntExtra("nowait", 0) != 0) {
            //[see subsection 2.3.2]
            rebootSystem("Received ACTION_REBOOT broadcast");
            return;
        }
        Slog.w(TAG, "Unsupported ACTION_REBOOT broadcast: " + intent);
    }
}

2.3.2 rebootSystem

[-> Watchdog.java]

void rebootSystem(String reason) {
    Slog.i(TAG, "Rebooting system because: " + reason);
    IPowerManager pms = (IPowerManager)ServiceManager.getService(Context.POWER_SERVICE);
    try {
        //reboot through PowerManager
        pms.reboot(false, reason, false);
    } catch (RemoteException ex) {
    }
}

Finally, the restart operation is completed through PowerManagerService. The specific restart process will be described separately later.

3, Watchdog detection mechanism

When calling watchdog getInstance(). When start(), enter the run() method of the thread "watchdog", which is divided into two parts:

  • The first half [subsection 3.1] is used to monitor whether the timeout is triggered;
  • In the second half [Section 4.1], when the trigger times out, various information will be output.

3.1 run

[-> Watchdog.java]

public void run() {
    boolean waitedHalf = false;
    while (true) {
        final ArrayList<HandlerChecker> blockedCheckers;
        final String subject;
        final boolean allowRestart;
        int debuggerWasConnected = 0;
        synchronized (this) {
            long timeout = CHECK_INTERVAL; //CHECK_INTERVAL=30s
            for (int i=0; i<mHandlerCheckers.size(); i++) {
                HandlerChecker hc = mHandlerCheckers.get(i);
                //Execute the monitoring methods of all checkers, and each Checker records the current mStartTime [see Section 3.2]
                hc.scheduleCheckLocked();
            }
 
            if (debuggerWasConnected > 0) {
                debuggerWasConnected--;
            }
 
            long start = SystemClock.uptimeMillis();
            //Through the cycle, it is guaranteed that the execution will continue for 30s
            while (timeout > 0) {
                if (Debug.isDebuggerConnected()) {
                    debuggerWasConnected = 2;
                }
                try {
                    wait(timeout); //Trigger the interrupt, catch the exception directly and continue to wait
                } catch (InterruptedException e) {
                    Log.wtf(TAG, e);
                }
                if (Debug.isDebuggerConnected()) {
                    debuggerWasConnected = 2;
                }
                timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);
            }
            
            //Evaluate Checker status [see subsection 3.3]
            final int waitState = evaluateCheckerCompletionLocked();
            if (waitState == COMPLETED) {
                waitedHalf = false;
                continue;
            } else if (waitState == WAITING) {
                continue;
            } else if (waitState == WAITED_HALF) {
                if (!waitedHalf) {
                    //Enter the state where the waiting time is more than half for the first time
                    ArrayList<Integer> pids = new ArrayList<Integer>();
                    pids.add(Process.myPid());
                    //Output system_ traces of server and three native processes [see Section 4.2]
                    ActivityManagerService.dumpStackTraces(true, pids, null, null,
                            NATIVE_STACKS_OF_INTEREST);
                    waitedHalf = true;
                }
                continue;
            }
            ... //Entering here means that the Watchdog has timed out [see Section 4.1]
        }
        ...
    }
}
 
public static final String[] NATIVE_STACKS_OF_INTEREST = new String[] {
    "/system/bin/mediaserver",
    "/system/bin/sdcard",
    "/system/bin/surfaceflinger"
};

The main functions of this method are as follows:

  1. Execute all the Checker's monitoring methods scheduleCheckLocked()
    • When the number of mmonitors is 0 (0 except android.fg threads) and is in the poll state, set mCompleted = true;
    • When the last check has not been completed, it returns directly
  2. After waiting for 30s, call evaluateCheckerCompletionLocked to evaluate the status of the Checker;
  3. Perform different operations according to the waitState state:
    • When COMPLETED or WAITING, there is peace;
    • When waited_ Half (more than 30s) and for the first time, system is output_ traces of server and three Native processes;
    • When OVERDUE, more information is output

Thus, it can be seen that when the Watchdog is triggered once, AMS will be called twice Dumpstacktraces, that is, system_ The traces information of the traces of server and three Native processes will be output twice, and the time interval exceeds 30s

3.2 scheduleCheckLocked

public final class HandlerChecker implements Runnable {
    ...
    public void scheduleCheckLocked() {
        if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {
            mCompleted = true; //Returns when the target looper is polling.
            return;
        }
 
        if (!mCompleted) {
            return; //If a check is being processed, there is no need to send it repeatedly
        }
        mCompleted = false;
        
        mCurrentMonitor = null;
        // Record the current time
        mStartTime = SystemClock.uptimeMillis();
        //Send a message and insert it into the beginning of the message queue. See the run() method below
        mHandler.postAtFrontOfQueue(this);
    }
    
    public void run() {
        final int size = mMonitors.size();
        for (int i = 0 ; i < size ; i++) {
            synchronized (Watchdog.this) {
                mCurrentMonitor = mMonitors.get(i);
            }
            //Callback the monitor method of the specific service
            mCurrentMonitor.monitor();
        }
 
        synchronized (Watchdog.this) {
            mCompleted = true;
            mCurrentMonitor = null;
        }
    }
}

The main function of this method is to execute the handlerchecker to the head of the Looper pool of the watchdog monitoring thread The run () method calls monitor() in this method, and mCompleted = true. after execution. If the current message in the handler message pool causes a delay in executing the monitor() method, the watchdog will be triggered

Where postAtFrontOfQueue(this), the input parameter of this method is the Runnable object, according to Message mechanism Finally, the run method in HandlerChecker will be called back. This method will loop through all Monitor interfaces. The specific service implements the monitor() method of this interface.

Possible problems. If there are other messages calling postAtFrontOfQueue() repeatedly, Watchdog may not have a chance to execute; Or each monitor consumes some time, which adds up to more than 1 minute, resulting in a Watchdog These are unconventional Watchdog

3.3 evaluateCheckerCompletionLocked

private int evaluateCheckerCompletionLocked() {
    int state = COMPLETED;
    for (int i=0; i<mHandlerCheckers.size(); i++) {
        HandlerChecker hc = mHandlerCheckers.get(i);
        //[see subsection 3.4]
        state = Math.max(state, hc.getCompletionStateLocked());
    }
    return state;
}

Get the state with the largest waiting state value in the mHandlerCheckers list

3.4 getCompletionStateLocked

public int getCompletionStateLocked() {
    if (mCompleted) {
        return COMPLETED;
    } else {
        long latency = SystemClock.uptimeMillis() - mStartTime;
        // mWaitMax defaults to 60s
        if (latency < mWaitMax/2) {
            return WAITING;
        } else if (latency < mWaitMax) {
            return WAITED_HALF;
        }
    }
    return OVERDUE;
}
  • COMPLETED = 0: wait for completion;
  • WAITING = 1: the waiting time is less than default_ Half of timeout, i.e. 30s;
  • WAITED_HALF = 2: the waiting time is between 30s and 60s;
  • OVERDUE = 3: the waiting time is greater than or equal to 60s.

IV Watchdog processing flow

4.1 run

[-> Watchdog.java]

public void run() {
    while (true) {
        synchronized (this) {
            ...
            //Get blocked checkers [see section 4.1.1]
            blockedCheckers = getBlockedCheckersLocked();
            // Obtain descriptive information [see subsection 4.1.2]
            subject = describeCheckersLocked(blockedCheckers);
            allowRestart = mAllowRestart;
        }
 
        EventLog.writeEvent(EventLogTags.WATCHDOG, subject);
 
        ArrayList<Integer> pids = new ArrayList<Integer>();
        pids.add(Process.myPid());
        if (mPhonePid > 0) pids.add(mPhonePid);
        //The second time, system is output in the form of addition_ Stack information of server and three native processes [see Section 4.2]
        final File stack = ActivityManagerService.dumpStackTraces(
                !waitedHalf, pids, null, null, NATIVE_STACKS_OF_INTEREST);
                
        //The system has been blocked for 1 minute, and it doesn't care to wait for another 2s to ensure the output of stack trace information
        SystemClock.sleep(2000);
 
        if (RECORD_KERNEL_THREADS) {
            //Output kernel stack information [see Section 4.3]
            dumpKernelStackTraces();
        }
 
        //Trigger the kernel to dump all blocked threads [see section 4.4]
        doSysRq('l');
        
        //Output dropbox information [see subsection 4.5]
        Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
            public void run() {
                mActivity.addErrorToDropBox(
                        "watchdog", null, "system_server", null, null,
                        subject, null, stack, null);
            }
        };
        dropboxThread.start();
        
        try {
            dropboxThread.join(2000); //Wait for the dropbox thread to work for 2s
        } catch (InterruptedException ignored) {
        }
 
        IActivityController controller;
        synchronized (this) {
            controller = mController;
        }
        if (controller != null) {
            //Report the blocking status to the activity controller,
            try {
                Binder.setDumpDisabled("Service dumps disabled due to hung system process.");
                //The return value of 1 means to continue waiting, and - 1 means to kill the system
                int res = controller.systemNotResponding(subject);
                if (res >= 0) {
                    waitedHalf = false; 
                    continue; //In some cases, setting the ActivityController allows you to continue waiting when a Watchdog occurs
                }
            } catch (RemoteException e) {
            }
        }
 
        //The process is killed when the debugger does not have an attach
        if (Debug.isDebuggerConnected()) {
            debuggerWasConnected = 2;
        }
        if (debuggerWasConnected >= 2) {
            Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
        } else if (debuggerWasConnected > 0) {
            Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");
        } else if (!allowRestart) {
            Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");
        } else {
            Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
            //Traverse the stack information of the output blocking thread
            for (int i=0; i<blockedCheckers.size(); i++) {
                Slog.w(TAG, blockedCheckers.get(i).getName() + " stack trace:");
                StackTraceElement[] stackTrace
                        = blockedCheckers.get(i).getThread().getStackTrace();
                for (StackTraceElement element: stackTrace) {
                    Slog.w(TAG, "    at " + element);
                }
            }
            Slog.w(TAG, "*** GOODBYE!");
            //Kill process system_server [see subsection 4.6]
            Process.killProcess(Process.myPid());
            System.exit(10);
        }
        waitedHalf = false;
    }
}

Information collection of exceptions detected by Watchdog:

  • AMS.dumpStackTraces: output stack information of Java and Native processes;
  • WD.dumpKernelStackTraces: outputs Kernel stack information;
  • doSysRq
  • dropBox

After collecting the information, the system will be killed_ Server process. The default value of allowRestart here is true. When the am hang operation is executed, the system will not be killed if the restart is not allowed (allowRestart =false)_ Server process

4.1.1 getBlockedCheckersLocked

private ArrayList<HandlerChecker> getBlockedCheckersLocked() {
    ArrayList<HandlerChecker> checkers = new ArrayList<HandlerChecker>();
    //Traverse all checkers
    for (int i=0; i<mHandlerCheckers.size(); i++) {
        HandlerChecker hc = mHandlerCheckers.get(i);
        //Add all checker s that have not completed and have timed out to the queue
        if (hc.isOverdueLocked()) {
            checkers.add(hc);
        }
    }
    return checkers;
}

4.1.2 describeCheckersLocked

private String describeCheckersLocked(ArrayList<HandlerChecker> checkers) {
     StringBuilder builder = new StringBuilder(128);
     for (int i=0; i<checkers.size(); i++) {
         if (builder.length() > 0) {
             builder.append(", ");
         }
         // Output all checker information
         builder.append(checkers.get(i).describeBlockedStateLocked());
     }
     return builder.toString();
 }
 
 
 public String describeBlockedStateLocked() {
     //A non foreground thread entered the branch
     if (mCurrentMonitor == null) {
         return "Blocked in handler on " + mName + " (" + getThread().getName() + ")";
     //The foreground thread enters the branch
     } else {
         return "Blocked in monitor " + mCurrentMonitor.getClass().getName()
                 + " on " + mName + " (" + getThread().getName() + ")";
     }
 }

Record all handler threads or monitor s that take more than 1 minute to execute

  • When the output message is Blocked in handler, it means that the corresponding thread processes the current message for more than 1 minute;
  • When the output message is Blocked in monitor, it means that the corresponding thread processes the current message for more than 1 minute, or the monitor fails to get the lock;

4.2 AMS.dumpStackTraces

public static File dumpStackTraces(boolean clearTraces, ArrayList<Integer> firstPids,
        ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids, String[] nativeProcs) {
    //The default is data / anr / traces txt
    String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
    if (tracesPath == null || tracesPath.length() == 0) {
        return null;
    }
 
    File tracesFile = new File(tracesPath);
    try {
        //When clearTraces, delete the existing traces file
        if (clearTraces && tracesFile.exists()) tracesFile.delete();
        //Create traces file
        tracesFile.createNewFile();
        // -rw-rw-rw-
        FileUtils.setPermissions(tracesFile.getPath(), 0666, -1, -1);
    } catch (IOException e) {
        return null;
    }
    //Output trace content
    dumpStackTraces(tracesPath, firstPids, processCpuTracker, lastPids, nativeProcs);
    return tracesFile;
}

Output system_ Trace information of three native processes: server, MediaServer, / sdcard and surface linker.

4.3 WD.dumpKernelStackTraces

private File dumpKernelStackTraces() {
    // The path is data / anr / traces txt
    String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
    if (tracesPath == null || tracesPath.length() == 0) {
        return null;
    }
    // [see subsection 4.3.1]
    native_dumpKernelStacks(tracesPath);
    return new File(tracesPath);
}

native_dumpKernelStacks, called to Android via JNI_ server_ Watchdog. dumpKernelStacks() method in cpp file.

4.3.1 dumpKernelStacks

[-> android_server_Watchdog.cpp]

static void dumpKernelStacks(JNIEnv* env, jobject clazz, jstring pathStr) {
    char buf[128];
    DIR* taskdir;
    
    const char *path = env->GetStringUTFChars(pathStr, NULL);
    // Open the traces file
    int outFd = open(path, O_WRONLY | O_APPEND | O_CREAT,
        S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH);
    ...
 
    snprintf(buf, sizeof(buf), "\n----- begin pid %d kernel stacks -----\n", getpid());
    write(outFd, buf, strlen(buf));
 
    //Read all threads within the process
    snprintf(buf, sizeof(buf), "/proc/%d/task", getpid());
    taskdir = opendir(buf);
    if (taskdir != NULL) {
        struct dirent * ent;
        while ((ent = readdir(taskdir)) != NULL) {
            int tid = atoi(ent->d_name);
            if (tid > 0 && tid <= 65535) {
                //Output traces of each thread [4.3.2]
                dumpOneStack(tid, outFd);
            }
        }
        closedir(taskdir);
    }
 
    snprintf(buf, sizeof(buf), "----- end pid %d kernel stacks -----\n", getpid());
    write(outFd, buf, strlen(buf));
 
    close(outFd);
done:
    env->ReleaseStringUTFChars(pathStr, path);
}

Obtain all thread information in the current process by reading the node / proc/%d/task.

4.3.2 dumpOneStack

[-> android_server_Watchdog.cpp]

static void dumpOneStack(int tid, int outFd) {
    char buf[64];
    //By reading node / proc/%d/stack
    snprintf(buf, sizeof(buf), "/proc/%d/stack", tid);
    int stackFd = open(buf, O_RDONLY);
    if (stackFd >= 0) {
        //head
        strncat(buf, ":\n", sizeof(buf) - strlen(buf) - 1);
        write(outFd, buf, strlen(buf));
 
        //Copy stack information
        int nBytes;
        while ((nBytes = read(stackFd, buf, sizeof(buf))) > 0) {
            write(outFd, buf, nBytes);
        }
 
        //tail
        write(outFd, "\n", 1);
        close(stackFd);
    } else {
        ALOGE("Unable to open stack of tid %d : %d (%s)", tid, errno, strerror(errno));
    }
}

4.4 WD.doSysRq

private void doSysRq(char c) {
    try {
        FileWriter sysrq_trigger = new FileWriter("/proc/sysrq-trigger");
        sysrq_trigger.write(c);
        sysrq_trigger.close();
    } catch (IOException e) {
        Slog.w(TAG, "Failed to write to /proc/sysrq-trigger", e);
    }
}

By writing characters to the node / proc / sysrq trigger, the kernel is triggered to dump all blocked threads and output the backtrace of all CPU s to the kernel log.

4.5 dropBox

About dropbox dropBox source code As explained in detail, output the file to / data/system/dropbox. When the watchdog is triggered, the tag of the generated dropbox file is system_server_watchdog, which contains traces and corresponding blocked information.

4.6 killProcess

Process.killProcess is already in the article Understand the implementation principle of killing process It has been explained in detail that the process of killing the process is completed by sending signal 9 to the target process.

When you kill system_server process, which leads to the suicide of zygote process, and then triggers init to restart zygote process, which leads to the restart of mobile phone framework.

V summary

Watchdog is a system running on_ Thread named "watchdog" of server process:

  • In the watchdog operation process, when the blocking time exceeds 1 minute, the watchdog will be triggered once, which will kill the system_server, trigger upper layer restart;
  • mHandlerCheckers records the list of all HandlerChecker objects, including the handler of foreground, main, UI, I / O and display threads;
  • mHandlerChecker.mMonitors records that all Watchdog monitors are currently running on the foreground thread.
  • There are two ways to join Watchdog monitoring:
    • addThread(): used to monitor the handler object. The default timeout is 60s This timeout is often caused by the slow message processing of the corresponding handler thread;
    • addMonitor(): used to monitor and implement watchdog Service of monitor interface This timeout may be "Android "FG" thread message processing is slow, or the monitor is slow to get the lock;

In the following cases, even if Watchdog is triggered, the system will not be killed_ Server process:

  • Monkey: set IActivityController to intercept systemNotResponding events, such as monkey
  • Hang: execute the am hang command without restarting;
  • Debugger: when the debugger is connected, it will not be restarted;

5.1 output information

If watchdog is blocked for 1 minute during check, it will output:

  1. AMS.dumpStackTraces: output system_ traces of server and three native processes
    • The method will output twice, the first time at the timeout of 30s; The second timeout is 1min;
  2. WD.dumpKernelStackTraces, output system_ The kernel stack of all threads in the server process;
    • Node / proc/%d/task gets the list of all threads in the process
    • Node / proc/%d/stack get stack of kernel
  3. doSysRq, trigger the kernel to dump all blocked threads, and output the backtrace of all CPU s to the kernel log;
    • Node / proc / sysrq trigger
  4. dropBox, output the file to / data/system/dropbox, and the content is trace + blocked information
  5. Kill system_server, and then trigger the zygote process to commit suicide, so as to restart the upper framework.

5.2 Handler mode

The threads monitored by Watchdog are: DEFAULT_TIMEOUT=60s, 10s during debugging, so as to find out potential ANR problems.

Thread nameCorresponding handlerexplain
system_servernew Handler(Looper.getMainLooper())Current main thread
android.fgFgThread.getHandlerForeground Threads
android.uiUiThread.getHandlerUI thread
android.ioIoThread.getHandlerI/O thread
android.displayDisplayThread.getHandlerdisplay thread
ActivityManagerAMS.MainHandlerUsed in AMS constructors
PowerManagerServicePMS.PowerManagerHandlerPMS. Used in onstart()

At present, watchdog will monitor the system_ For the above seven threads in the server process, the loop message processing time of these threads must not exceed 1 minute.

5.3 Monitor mode

All system services that can be monitored by Watchdog have implemented Watchdog Monitor interface and implement the monitor() method. Running on Android FG thread. The interface classes implemented in the system mainly include:

  • ActivityManagerService
  • WindowManagerService
  • InputManagerService
  • PowerManagerService
  • NetworkManagementService
  • MountService
  • NativeDaemonConnector
  • BinderThreadMonitor
  • MediaProjectionManagerService
  • MediaRouterService
  • MediaSessionService
  • BinderThreadMonitor

Original address: http://gityuan.com/2016/06/21/watchdog/

Topics: Java webview