This series is the third issue of my TM silly series [covering your face], a wonderful review of previous periods:
- Upgrade to spring 5.3.0 After X, the number of GC increased sharply, and I was stupid
- How can the SQL of the index field query of this large table be fully scanned? I'm stupid
data:image/s3,"s3://crabby-images/0e9e9/0e9e97bd503e7280f322648fbbffaf0c1feeb3fa" alt=""
Recently, the team made some development with the SDK given by a third party. Recently, the online suddenly began to report errors and found a particularly strange problem. The team member told me that the code didn't go until half of the code was run, and skipped a section (did the code just participate in the long jump in the East Olympic Games???).
The code is as follows, and the logic is very simple:
try { log.info("initiate client with conf: {}", conf); SDKClient client = new SDKClient(conf); client.init(); log.info("client initiated"); } catch (Exception e) { log.error("initiate client failed", e); } log.info("start to manipulate...");
We found that the client actually failed to initialize successfully, and the subsequent business processing has been reporting errors. Check the log and find:
initiate client with conf: xxxxx start to manipulate...
This is what the team members said that the code jumped. Because neither client initiated nor initiate client failed is printed... Start to manage directly Yes.
data:image/s3,"s3://crabby-images/df45e/df45ed0f4288dd49af3548d1d05c256ccaf24a5d" alt=""
Old readers know that our online is k8s + Docker, and Arthas is built in each image, the Java version is Java 16, and JFR is enabled. There is link information in the log, which is pulled to the unified log server through ELK Agent.
The remote addresses to be accessed in the SDK have IP white lists. For security, we cannot directly use the SDK to access each other's online environment. The local test is connected to the other party's test environment. There is no problem. So here, we still have to locate through Arthas.
First of all, we have to see whether the source code running online is consistent with what we see locally? This can be done through the jad command:
jad Fully qualified name of the class to see
After checking, it is found that the decompiled code is consistent with our source code.
Then let's look at the actual execution of the code:
trace Class fully qualified name method to look at
Then re execute this method and check trace. It is found that an exception was thrown during initialization:
# Omit what we don't care about here +---[min=0.010174ms,max=0.01184ms,total=0.022014ms,count=2] org.apache.logging.log4j.Logger:info() #130 +---[min=599.388978ms,max=630.23967ms,total=1229.628648ms,count=2] com.dasha13.sdk.SDKClient:<init>() #131 +---[min=203.617545ms,max=221.785512ms,total=425.403057ms,count=2] com.dasha13.sdk.SDKClient:init() #132 [throws Exception,2] +---[min=0.034798ms,max=0.084505ms,total=0.119303ms,count=2] org.apache.logging.log4j.Logger:error() #136 +---[min=0.010174ms,max=0.01184ms,total=0.022014ms,count=2] org.apache.logging.log4j.Logger:info() #138
However, why didn't the exception log be printed out? Let's continue to check the exception, use the watch method, and specify the viewing depth as 2. In this way, we expect to print the stack and Message:
watch com.dasha13.sdk.SDKClient init {throwExp} -x 2
However, only a Message that looks like a Message is printed here:
method=com.dasha13.sdk.SDKClient init location=AtExceptionExit ts=2021-08-10 02:58:15; [cost=131.20209ms] result=ERROR DATA!!! object class: class java.util.ArrayList, exception class: class com.google.common.util.concurrent.UncheckedExecutionException, exception message: java.lang.IllegalArgumentException
This is strange. Normally, the specified depth is 2. If an exception is thrown, the output information will contain the Message of the exception and the stack information. What's going on? Let's get the stack and information respectively:
Get the stack first:
watch com.dasha13.sdk.SDKClient init {throwExp.getStackTrace()} -x 2
Re execute the method with the problem. The stack outputs normally. There is no problem. However, looking at the stack, it should be related to the exception of loading a Bean by Google's dependency flip Bean management framework (similar to Spring) Guice:
ts=2021-08-10 03:03:37; [cost=146.644563ms] result=@ArrayList[ @StackTraceElement[][ @StackTraceElement[com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1025)], @StackTraceElement[com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1051)], @StackTraceElement[com.dasha13.sdk.SDKClient.init(SDKClient.java:482)], # After omission
Let's look at the exception information:
watch com.dasha13.sdk.SDKClient init {throwExp.getMessage()} -x 2
Re execute the problem method. At this time, it is found that the watch fails:
watch failed, condition is: null, express is: {throwExp.getMessage()}, com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException, visit /app/arthas/arthas.log for more details.
According to the prompt, we check the arthas log and find the exception stack:
2021-08-10 03:07:11 [XNIO-2 task-3] ERROR c.t.a.c.command.express.OgnlExpress -Error during evaluating the expression: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203) at com.google.common.cache.LocalCache.get(LocalCache.java:3937) at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824) at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4830) at com.google.inject.internal.util.StackTraceElements.forMember(StackTraceElements.java:66) at com.google.inject.internal.Errors.formatSource(Errors.java:806) at com.google.inject.internal.Errors.formatSource(Errors.java:785) at com.google.inject.internal.Errors.formatInjectionPoint(Errors.java:839) at com.google.inject.internal.Errors.formatSource(Errors.java:800) at com.google.inject.internal.Errors.formatSource(Errors.java:785) at com.google.inject.internal.Errors.format(Errors.java:584) at com.google.inject.ProvisionException.getMessage(ProvisionException.java:60) cause by: MethodNotFoundException: Method not found: class com.google.common.xxxxxxxxx
We found that the exception occurred in getMessage() of ProvisionException, that is, the exception occurred in getMessage() of exception Looking at the Cause of the exception, we also found that it was caused by the incompatibility between the Guava version and the guice version. The root Cause was that the three-party interface timed out, resulting in an initialization exception. An exception throw was encapsulated as a ProvisionException. The getMessage of the ProvisionException exception relied on the Guava Cache to cache some exception information, However, the Guava version in our project is incompatible with the guice version, resulting in the absence of some methods, so the getMessage with ProvisionException exception will also have an exception. There was no problem in the previous operation because the interface timed out and threw an exception when the three parties had not been initialized...
data:image/s3,"s3://crabby-images/165fc/165fc34ac79d1eb5f12a74349b5f535795bc0dc3" alt=""
We use log4j2 asynchronous log configuration and pass the exception as the last parameter into the log method. Normally, the Message and exception stack of the exception will be output However, from the above analysis, we know that an exception is thrown when obtaining a Message. Log4j is designed to use the production and consumption architecture of log events. Here is the exception Message and exception stack obtained by the consumer. When obtaining the Message, an exception is found. For log4j2 asynchronous logs, when an exception is found, the original log events will be discarded directly and the exception will be output to the StatusLogger (the bottom layer is actually the standard exception output). Here is the source code of log4j:
private void tryCallAppender(final LogEvent event) { try { //Call appender to output log appender.append(event); } catch (final RuntimeException error) { //Handling RuntimeException handleAppenderError(event, error); } catch (final Exception error) { //Handle other exceptions handleAppenderError(event, new AppenderLoggingException(error)); } } private void handleAppenderError(final LogEvent event, final RuntimeException ex) { appender.getHandler().error(createErrorMsg("An exception occurred processing Appender "), event, ex); if (!appender.ignoreExceptions()) { throw ex; } }
ErrorHandler is generally implemented by default, i.e. DefaultErrorHandler; DefaultErrorHandler is output to a StatusLogger:
private static final Logger LOGGER = StatusLogger.getLogger(); public void error(final String msg, final LogEvent event, final Throwable t) { final long current = System.nanoTime(); if (current - lastException > EXCEPTION_INTERVAL || exceptionCount++ < MAX_EXCEPTIONS) { LOGGER.error(msg, t); } lastException = current; if (!appender.ignoreExceptions() && t != null && !(t instanceof AppenderLoggingException)) { throw new AppenderLoggingException(msg, t); } }
The StatusLogger is actually the standard exception output system err:
this.logger = new SimpleLogger("StatusLogger", Level.ERROR, false, true, showDateTime, false, dateFormat, messageFactory, PROPS, //Standard exception output System.err);
In our deployment architecture, we put the standard exception output in a very remote location, and almost no one saw it, so we didn't notice... Check the standard exception output and you will find that there are exceptions:
2021-08-10 03:30:29,810 Log4j2-TF-10-AsyncLoggerConfig-3 ERROR An exception occurred processing Appender file com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203) at com.google.common.cache.LocalCache.get(LocalCache.java:3937) at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824) at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4830) at com.google.inject.internal.util.StackTraceElements.forMember(StackTraceElements.java:66) at com.google.inject.internal.Errors.formatSource(Errors.java:806) at com.google.inject.internal.Errors.formatSource(Errors.java:785) at com.google.inject.internal.Errors.formatInjectionPoint(Errors.java:839) at com.google.inject.internal.Errors.formatSource(Errors.java:800) at com.google.inject.internal.Errors.formatSource(Errors.java:785) at com.google.inject.internal.Errors.format(Errors.java:584) at com.google.inject.ProvisionException.getMessage(ProvisionException.java:60) at org.apache.logging.log4j.core.impl.ThrowableProxy.<init>(ThrowableProxy.java:105) at org.apache.logging.log4j.core.impl.ThrowableProxy.<init>(ThrowableProxy.java:93) at org.apache.logging.log4j.core.impl.Log4jLogEvent.getThrownProxy(Log4jLogEvent.java:629) at org.apache.logging.log4j.core.pattern.ExtendedThrowablePatternConverter.format(ExtendedThrowablePatternConverter.java:63) at org.springframework.boot.logging.log4j2.ExtendedWhitespaceThrowablePatternConverter.format(ExtendedWhitespaceThrowablePatternConverter.java:50) at org.apache.logging.log4j.core.pattern.PatternFormatter.format(PatternFormatter.java:38) at org.apache.logging.log4j.core.layout.PatternLayout$PatternSerializer.toSerializable(PatternLayout.java:345) at org.apache.logging.log4j.core.layout.PatternLayout.toText(PatternLayout.java:244) at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:229) at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:59) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:197) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:190) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:181) at org.apache.logging.log4j.core.appender.RollingFileAppender.append(RollingFileAppender.java:312) at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:156) at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:129) at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:120) at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84) at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:543) at org.apache.logging.log4j.core.async.AsyncLoggerConfig.callAppenders(AsyncLoggerConfig.java:127) at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:502) at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:485) at org.apache.logging.log4j.core.async.AsyncLoggerConfig.log(AsyncLoggerConfig.java:121) at org.apache.logging.log4j.core.async.AsyncLoggerConfig.logToAsyncLoggerConfigsOnCurrentThread(AsyncLoggerConfig.java:169) at org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor$Log4jEventWrapperHandler.onEvent(AsyncLoggerConfigDisruptor.java:111) at org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor$Log4jEventWrapperHandler.onEvent(AsyncLoggerConfigDisruptor.java:97) at com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:168) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.IllegalArgumentException at com.google.inject.internal.asm.$ClassReader.<init>(Unknown Source) at com.google.inject.internal.asm.$ClassReader.<init>(Unknown Source) at com.google.inject.internal.asm.$ClassReader.<init>(Unknown Source) at com.google.inject.internal.util.LineNumbers.<init>(LineNumbers.java:65) at com.google.inject.internal.util.StackTraceElements$1.load(StackTraceElements.java:46) at com.google.inject.internal.util.StackTraceElements$1.load(StackTraceElements.java:43) at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282) at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197) ... 41 more
And, after that, It will be configured according to the ignoreExceptions of the Appender (all true by default) determines whether exceptions will be thrown at the place where the log method is called, but this is for the synchronous log. The asynchronous log will throw exceptions to the exception handler of the debugger. The exception handling of the log4j2 debugger also outputs exceptions to System.err, that is, the standard exception output. It is not thrown by default. After all, no one wants the synchronous log Because there are exceptions in the log, the business cannot proceed normally. Because the previous processing of asynchronous log has been output to the standard exception output, there is no need to do more.