If there is another exception in the exception information, I can't find the log. I'm stupid

Posted by MarkR on Tue, 04 Jan 2022 18:41:14 +0100

This series is the third issue of my TM silly series [covering your face], a wonderful review of previous periods:

Recently, the team made some development with the SDK given by a third party. Recently, the online suddenly began to report errors and found a particularly strange problem. The team member told me that the code didn't go until half of the code was run, and skipped a section (did the code just participate in the long jump in the East Olympic Games???).

The code is as follows, and the logic is very simple:

try {
    log.info("initiate client with conf: {}", conf);
    SDKClient client = new SDKClient(conf);
    client.init();
    log.info("client initiated");
} catch (Exception e) {
    log.error("initiate client failed", e);
}
log.info("start to manipulate...");

We found that the client actually failed to initialize successfully, and the subsequent business processing has been reporting errors. Check the log and find:

initiate client with conf: xxxxx
start to manipulate...

This is what the team members said that the code jumped. Because neither client initiated nor initiate client failed is printed... Start to manage directly Yes.

Old readers know that our online is k8s + Docker, and Arthas is built in each image, the Java version is Java 16, and JFR is enabled. There is link information in the log, which is pulled to the unified log server through ELK Agent.

The remote addresses to be accessed in the SDK have IP white lists. For security, we cannot directly use the SDK to access each other's online environment. The local test is connected to the other party's test environment. There is no problem. So here, we still have to locate through Arthas.

First of all, we have to see whether the source code running online is consistent with what we see locally? This can be done through the jad command:

jad Fully qualified name of the class to see

After checking, it is found that the decompiled code is consistent with our source code.

Then let's look at the actual execution of the code:

trace Class fully qualified name method to look at

Then re execute this method and check trace. It is found that an exception was thrown during initialization:

# Omit what we don't care about here

    +---[min=0.010174ms,max=0.01184ms,total=0.022014ms,count=2] org.apache.logging.log4j.Logger:info() #130
    +---[min=599.388978ms,max=630.23967ms,total=1229.628648ms,count=2] com.dasha13.sdk.SDKClient:<init>() #131
    +---[min=203.617545ms,max=221.785512ms,total=425.403057ms,count=2] com.dasha13.sdk.SDKClient:init() #132 [throws Exception,2]
    +---[min=0.034798ms,max=0.084505ms,total=0.119303ms,count=2] org.apache.logging.log4j.Logger:error() #136
    +---[min=0.010174ms,max=0.01184ms,total=0.022014ms,count=2] org.apache.logging.log4j.Logger:info() #138

However, why didn't the exception log be printed out? Let's continue to check the exception, use the watch method, and specify the viewing depth as 2. In this way, we expect to print the stack and Message:

watch com.dasha13.sdk.SDKClient init {throwExp} -x 2

However, only a Message that looks like a Message is printed here:

method=com.dasha13.sdk.SDKClient init location=AtExceptionExit
ts=2021-08-10 02:58:15; [cost=131.20209ms] result=ERROR DATA!!! object class: class java.util.ArrayList, exception class: class com.google.common.util.concurrent.UncheckedExecutionException, exception message: java.lang.IllegalArgumentException

This is strange. Normally, the specified depth is 2. If an exception is thrown, the output information will contain the Message of the exception and the stack information. What's going on? Let's get the stack and information respectively:

Get the stack first:

watch com.dasha13.sdk.SDKClient init {throwExp.getStackTrace()} -x 2

Re execute the method with the problem. The stack outputs normally. There is no problem. However, looking at the stack, it should be related to the exception of loading a Bean by Google's dependency flip Bean management framework (similar to Spring) Guice:

ts=2021-08-10 03:03:37; [cost=146.644563ms] result=@ArrayList[
    @StackTraceElement[][
        @StackTraceElement[com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1025)],
        @StackTraceElement[com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1051)],
        @StackTraceElement[com.dasha13.sdk.SDKClient.init(SDKClient.java:482)],
        # After omission

Let's look at the exception information:

watch com.dasha13.sdk.SDKClient init {throwExp.getMessage()} -x 2

Re execute the problem method. At this time, it is found that the watch fails:

watch failed, condition is: null, express is: {throwExp.getMessage()}, com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException, visit /app/arthas/arthas.log for more details.

According to the prompt, we check the arthas log and find the exception stack:

2021-08-10 03:07:11 [XNIO-2 task-3] ERROR c.t.a.c.command.express.OgnlExpress -Error during evaluating the expression:
com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException                                
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203)                                
        at com.google.common.cache.LocalCache.get(LocalCache.java:3937)                                       
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)                           
        at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)                                 
        at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4830)    
        at com.google.inject.internal.util.StackTraceElements.forMember(StackTraceElements.java:66)                     
        at com.google.inject.internal.Errors.formatSource(Errors.java:806)                                          
        at com.google.inject.internal.Errors.formatSource(Errors.java:785)                                               
        at com.google.inject.internal.Errors.formatInjectionPoint(Errors.java:839)                                                 
        at com.google.inject.internal.Errors.formatSource(Errors.java:800)                           
        at com.google.inject.internal.Errors.formatSource(Errors.java:785)                                    
        at com.google.inject.internal.Errors.format(Errors.java:584)                                            
        at com.google.inject.ProvisionException.getMessage(ProvisionException.java:60)       
cause by: MethodNotFoundException: Method not found: class com.google.common.xxxxxxxxx 

We found that the exception occurred in getMessage() of ProvisionException, that is, the exception occurred in getMessage() of exception Looking at the Cause of the exception, we also found that it was caused by the incompatibility between the Guava version and the guice version. The root Cause was that the three-party interface timed out, resulting in an initialization exception. An exception throw was encapsulated as a ProvisionException. The getMessage of the ProvisionException exception relied on the Guava Cache to cache some exception information, However, the Guava version in our project is incompatible with the guice version, resulting in the absence of some methods, so the getMessage with ProvisionException exception will also have an exception. There was no problem in the previous operation because the interface timed out and threw an exception when the three parties had not been initialized...

We use log4j2 asynchronous log configuration and pass the exception as the last parameter into the log method. Normally, the Message and exception stack of the exception will be output However, from the above analysis, we know that an exception is thrown when obtaining a Message. Log4j is designed to use the production and consumption architecture of log events. Here is the exception Message and exception stack obtained by the consumer. When obtaining the Message, an exception is found. For log4j2 asynchronous logs, when an exception is found, the original log events will be discarded directly and the exception will be output to the StatusLogger (the bottom layer is actually the standard exception output). Here is the source code of log4j:

AppenderControl

private void tryCallAppender(final LogEvent event) {
    try {
        //Call appender to output log
        appender.append(event);
    } catch (final RuntimeException error) {
        //Handling RuntimeException
        handleAppenderError(event, error);
    } catch (final Exception error) {
        //Handle other exceptions
        handleAppenderError(event, new AppenderLoggingException(error));
    }
}

private void handleAppenderError(final LogEvent event, final RuntimeException ex) {
    appender.getHandler().error(createErrorMsg("An exception occurred processing Appender "), event, ex);
    if (!appender.ignoreExceptions()) {
        throw ex;
    }
}

ErrorHandler is generally implemented by default, i.e. DefaultErrorHandler; DefaultErrorHandler is output to a StatusLogger:

DefaultErrorHandler

private static final Logger LOGGER = StatusLogger.getLogger();
public void error(final String msg, final LogEvent event, final Throwable t) {
    final long current = System.nanoTime();
    if (current - lastException > EXCEPTION_INTERVAL || exceptionCount++ < MAX_EXCEPTIONS) {
        LOGGER.error(msg, t);
    }
    lastException = current;
    if (!appender.ignoreExceptions() && t != null && !(t instanceof AppenderLoggingException)) {
        throw new AppenderLoggingException(msg, t);
    }
}

The StatusLogger is actually the standard exception output system err:

StatusLogger

this.logger = new SimpleLogger("StatusLogger", Level.ERROR, false, true, showDateTime, false,
                dateFormat, messageFactory, PROPS, 
                //Standard exception output
                System.err);

In our deployment architecture, we put the standard exception output in a very remote location, and almost no one saw it, so we didn't notice... Check the standard exception output and you will find that there are exceptions:

2021-08-10 03:30:29,810 Log4j2-TF-10-AsyncLoggerConfig-3 ERROR An exception occurred processing Appender file com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException
	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203)
	at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
	at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
	at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
	at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4830)
	at com.google.inject.internal.util.StackTraceElements.forMember(StackTraceElements.java:66)
	at com.google.inject.internal.Errors.formatSource(Errors.java:806)
	at com.google.inject.internal.Errors.formatSource(Errors.java:785)
	at com.google.inject.internal.Errors.formatInjectionPoint(Errors.java:839)
	at com.google.inject.internal.Errors.formatSource(Errors.java:800)
	at com.google.inject.internal.Errors.formatSource(Errors.java:785)
	at com.google.inject.internal.Errors.format(Errors.java:584)
	at com.google.inject.ProvisionException.getMessage(ProvisionException.java:60)
	at org.apache.logging.log4j.core.impl.ThrowableProxy.<init>(ThrowableProxy.java:105)
	at org.apache.logging.log4j.core.impl.ThrowableProxy.<init>(ThrowableProxy.java:93)
	at org.apache.logging.log4j.core.impl.Log4jLogEvent.getThrownProxy(Log4jLogEvent.java:629)
	at org.apache.logging.log4j.core.pattern.ExtendedThrowablePatternConverter.format(ExtendedThrowablePatternConverter.java:63)
	at org.springframework.boot.logging.log4j2.ExtendedWhitespaceThrowablePatternConverter.format(ExtendedWhitespaceThrowablePatternConverter.java:50)
	at org.apache.logging.log4j.core.pattern.PatternFormatter.format(PatternFormatter.java:38)
	at org.apache.logging.log4j.core.layout.PatternLayout$PatternSerializer.toSerializable(PatternLayout.java:345)
	at org.apache.logging.log4j.core.layout.PatternLayout.toText(PatternLayout.java:244)
	at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:229)
	at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:59)
	at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:197)
	at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:190)
	at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:181)
	at org.apache.logging.log4j.core.appender.RollingFileAppender.append(RollingFileAppender.java:312)
	at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:156)
	at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:129)
	at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:120)
	at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84)
	at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:543)
	at org.apache.logging.log4j.core.async.AsyncLoggerConfig.callAppenders(AsyncLoggerConfig.java:127)
	at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:502)
	at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:485)
	at org.apache.logging.log4j.core.async.AsyncLoggerConfig.log(AsyncLoggerConfig.java:121)
	at org.apache.logging.log4j.core.async.AsyncLoggerConfig.logToAsyncLoggerConfigsOnCurrentThread(AsyncLoggerConfig.java:169)
	at org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor$Log4jEventWrapperHandler.onEvent(AsyncLoggerConfigDisruptor.java:111)
	at org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor$Log4jEventWrapperHandler.onEvent(AsyncLoggerConfigDisruptor.java:97)
	at com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:168)
	at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.IllegalArgumentException
	at com.google.inject.internal.asm.$ClassReader.<init>(Unknown Source)
	at com.google.inject.internal.asm.$ClassReader.<init>(Unknown Source)
	at com.google.inject.internal.asm.$ClassReader.<init>(Unknown Source)
	at com.google.inject.internal.util.LineNumbers.<init>(LineNumbers.java:65)
	at com.google.inject.internal.util.StackTraceElements$1.load(StackTraceElements.java:46)
	at com.google.inject.internal.util.StackTraceElements$1.load(StackTraceElements.java:43)
	at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
	at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
	at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
	... 41 more

And, after that, It will be configured according to the ignoreExceptions of the Appender (all true by default) determines whether exceptions will be thrown at the place where the log method is called, but this is for the synchronous log. The asynchronous log will throw exceptions to the exception handler of the debugger. The exception handling of the log4j2 debugger also outputs exceptions to System.err, that is, the standard exception output. It is not thrown by default. After all, no one wants the synchronous log Because there are exceptions in the log, the business cannot proceed normally. Because the previous processing of asynchronous log has been output to the standard exception output, there is no need to do more.