Ali artifact, a move to locate the online Bug

Posted by numan82 on Wed, 15 Dec 2021 16:06:17 +0100

background

The company has a channel system, which is specially connected with three-party channels. There is no business logic. It mainly works on message conversion and parameter verification, which plays a connecting role.

Recently, after optimizing the response time of the interface and the code, the time still can not meet the requirements; There is a strange time-consuming problem of about 100ms. After printing the request processing time in the interface, there is still a difference of about 100ms from the response time of the caller. For example, 150ms is recorded in the program, but the waiting time of the caller is about 250ms.

The following is a record of the detailed positioning & solution process at that time (in fact, the solution is very simple, and the key lies in how to locate and find a solution to the problem)

Positioning process

Analysis code

Channel system is a common spring boot web project, which uses the integrated tomcat. After analyzing the code, it is found that there is no special place, no special filter or interceptor, so it is preliminarily ruled out that it is a business code problem

Analyze call process

After this problem occurs, first confirm the calling process of the next interface. Because it is an internal test, there are fewer calling processes.

Nginx - reverse agent - > channel system

The company is a cloud server, and the network is also a cloud intranet. Because the cause of the problem is not clear, use the exclusion method to first confirm whether there is a problem in the server network.

First confirm whether there is a problem between the sender and the Nginx Host:

[jboss@VM_0_139_centos ~]$ ping 10.0.0.139
PING 10.0.0.139 (10.0.0.139) 56(84) bytes of data.
64 bytes from 10.0.0.139: icmp_seq=1 ttl=64 time=0.029 ms
64 bytes from 10.0.0.139: icmp_seq=2 ttl=64 time=0.041 ms
64 bytes from 10.0.0.139: icmp_seq=3 ttl=64 time=0.040 ms
64 bytes from 10.0.0.139: icmp_seq=4 ttl=64 time=0.040 ms

From the ping results, the delay from the sender to the Nginx host is no problem. Next, check the network from Nginx to the channel system.

# Since there is no problem with logs, the above logs are copied directly here
[jboss@VM_0_139_centos ~]$ ping 10.0.0.139
PING 10.0.0.139 (10.0.0.139) 56(84) bytes of data.
64 bytes from 10.0.0.139: icmp_seq=1 ttl=64 time=0.029 ms
64 bytes from 10.0.0.139: icmp_seq=2 ttl=64 time=0.041 ms
64 bytes from 10.0.0.139: icmp_seq=3 ttl=64 time=0.040 ms
64 bytes from 10.0.0.139: icmp_seq=4 ttl=64 time=0.040 ms

From the ping results, the network delay from Nginx to the channel system server is no problem

Since the network seems to be OK, you can continue the troubleshooting method, cut off Nginx, and directly connect the client to the server of the channel system through the loopback address (localhost), so as to avoid passing through the network card / dns, narrow the scope of the problem and see whether it can be reproduced (this application and address are simulated in my later stage, and an empty interface is tested):

[jboss@VM_10_91_centos tmp]$ curl -w "@curl-time.txt" http://127.0.0.1:7744/send
success
              http: 200
               dns: 0.001s
          redirect: 0.000s
      time_connect: 0.001s
   time_appconnect: 0.000s
  time_pretransfer: 0.001s
time_starttransfer: 0.073s
     size_download: 7bytes
    speed_download: 95.000B/s
                  ----------
        time_total: 0.073s Total request time

From the curl log, it takes 73 MS to call an empty interface through the loopback address. This is strange. Skip all the calling nodes in the middle (including filters & interceptors) and directly request the application of an empty interface. It takes 73 MS to request again:

[jboss@VM_10_91_centos tmp]$ curl -w "@curl-time.txt" http://127.0.0.1:7744/send
success
              http: 200
               dns: 0.001s
          redirect: 0.000s
      time_connect: 0.001s
   time_appconnect: 0.000s
  time_pretransfer: 0.001s
time_starttransfer: 0.003s
     size_download: 7bytes
    speed_download: 2611.000B/s
                  ----------
        time_total: 0.003s

What's more strange is that the second request takes a normal time, which becomes 3ms. After consulting the data, linux curl turns on HTTP keep alive by default. Even if keep alive is not turned on, it will not take 70ms each time handshake is restarted.

After continuous analysis and testing, it is found that the time for continuous requests will be very short, and each request only takes a few milliseconds, but if you request again after a period of time, it will take more than 70ms.

From this phenomenon, it is speculated that it may be caused by some caching mechanisms. Continuous requests have a cache, so they are fast and take a long time. After cache failure, it takes a long time.

So where is the problem? tomcat layer or spring webmvc?

Just guessing can't locate the problem. We still have to actually test whether the code of the channel system can be reproduced by putting it into the local ide

However, after importing the local Ide, the problem cannot be reproduced after starting in the Ide, and there is no delay of 70+ms. This is a headache. You can't reproduce locally and Debug. Because the problem is not in the business code, you can't Debug by adding logs

At this time, you can sacrifice the artifact Arthas

Arthas analysis problem

Arthas is an open source Java diagnostic tool of Alibaba, which is deeply loved by developers.

When you encounter the following similar problems and are helpless, Arthas can help you solve them:

  • Which jar package is this class loaded from? Why are all kinds of related exceptions reported?
  • Why didn't the code I changed execute? Am I not commit ted? Wrong branch?
  • You can't debug online when you encounter a problem. Can you only republish it by adding a log?
  • There is a problem with the data processing of a user online, but it is also impossible to debug online and reproduce offline!
  • Is there a global perspective to view the health of the system?
  • Is there any way to monitor the real-time running status of the JVM?

The above is the official profile of Arthas. This time, I only need to use his small function trace. Dynamically calculate the method call path and time so that I can locate where the time is consumed.

  • The trace method calls the path internally and outputs the time-consuming data on each node on the method path
  • The trace command can actively search {class pattern / method pattern
  • According to the corresponding method call path, render and count all the performance overhead on the whole call link and track the call link.

With artifact, what method should we track? Since I am not very familiar with the source code of Tomcat, I can only start with spring mvc. Let's trace the entry of spring mvc first:

[arthas@24851]$ trace org.springframework.web.servlet.DispatcherServlet *
Press Q or Ctrl+C to abort.
Affect(class-cnt:1 , method-cnt:44) cost in 508 ms.
`---ts=2019-09-14 21:07:44;thread_name=http-nio-7744-exec-2;id=11;is_daemon=true;priority=5;TCCL=org.springframework.boot.web.embedded.tomcat.TomcatEmbeddedWebappClassLoader@7c136917
    `---[2.952142ms] org.springframework.web.servlet.DispatcherServlet:buildLocaleContext()

`---ts=2019-09-14 21:07:44;thread_name=http-nio-7744-exec-2;id=11;is_daemon=true;priority=5;TCCL=org.springframework.boot.web.embedded.tomcat.TomcatEmbeddedWebappClassLoader@7c136917
    `---[18.08903ms] org.springframework.web.servlet.DispatcherServlet:doService()
        +---[0.041346ms] org.apache.commons.logging.Log:isDebugEnabled() #889
        +---[0.022398ms] org.springframework.web.util.WebUtils:isIncludeRequest() #898
        +---[0.014904ms] org.springframework.web.servlet.DispatcherServlet:getWebApplicationContext() #910
        +---[1.071879ms] javax.servlet.http.HttpServletRequest:setAttribute() #910
        +---[0.020977ms] javax.servlet.http.HttpServletRequest:setAttribute() #911
        +---[0.017073ms] javax.servlet.http.HttpServletRequest:setAttribute() #912
        +---[0.218277ms] org.springframework.web.servlet.DispatcherServlet:getThemeSource() #913
        |   `---[0.137568ms] org.springframework.web.servlet.DispatcherServlet:getThemeSource()
        |       `---[min=0.00783ms,max=0.014251ms,total=0.022081ms,count=2] org.springframework.web.servlet.DispatcherServlet:getWebApplicationContext() #782
        +---[0.019363ms] javax.servlet.http.HttpServletRequest:setAttribute() #913
        +---[0.070694ms] org.springframework.web.servlet.FlashMapManager:retrieveAndUpdate() #916
        +---[0.01839ms] org.springframework.web.servlet.FlashMap:<init>() #920
        +---[0.016943ms] javax.servlet.http.HttpServletRequest:setAttribute() #920
        +---[0.015268ms] javax.servlet.http.HttpServletRequest:setAttribute() #921
        +---[15.050124ms] org.springframework.web.servlet.DispatcherServlet:doDispatch() #925
        |   `---[14.943477ms] org.springframework.web.servlet.DispatcherServlet:doDispatch()
        |       +---[0.019135ms] org.springframework.web.context.request.async.WebAsyncUtils:getAsyncManager() #953
        |       +---[2.108373ms] org.springframework.web.servlet.DispatcherServlet:checkMultipart() #960
        |       |   `---[2.004436ms] org.springframework.web.servlet.DispatcherServlet:checkMultipart()
        |       |       `---[1.890845ms] org.springframework.web.multipart.MultipartResolver:isMultipart() #1117
        |       +---[2.054361ms] org.springframework.web.servlet.DispatcherServlet:getHandler() #964
        |       |   `---[1.961963ms] org.springframework.web.servlet.DispatcherServlet:getHandler()
        |       |       +---[0.02051ms] java.util.List:iterator() #1183
        |       |       +---[min=0.003805ms,max=0.009641ms,total=0.013446ms,count=2] java.util.Iterator:hasNext() #1183
        |       |       +---[min=0.003181ms,max=0.009751ms,total=0.012932ms,count=2] java.util.Iterator:next() #1183
        |       |       +---[min=0.005841ms,max=0.015308ms,total=0.021149ms,count=2] org.apache.commons.logging.Log:isTraceEnabled() #1184
        |       |       `---[min=0.474739ms,max=1.19145ms,total=1.666189ms,count=2] org.springframework.web.servlet.HandlerMapping:getHandler() #1188
        |       +---[0.013071ms] org.springframework.web.servlet.HandlerExecutionChain:getHandler() #971
        |       +---[0.372236ms] org.springframework.web.servlet.DispatcherServlet:getHandlerAdapter() #971
        |       |   `---[0.280073ms] org.springframework.web.servlet.DispatcherServlet:getHandlerAdapter()
        |       |       +---[0.004804ms] java.util.List:iterator() #1224
        |       |       +---[0.003668ms] java.util.Iterator:hasNext() #1224
        |       |       +---[0.003038ms] java.util.Iterator:next() #1224
        |       |       +---[0.006451ms] org.apache.commons.logging.Log:isTraceEnabled() #1225
        |       |       `---[0.012683ms] org.springframework.web.servlet.HandlerAdapter:supports() #1228
        |       +---[0.012848ms] javax.servlet.http.HttpServletRequest:getMethod() #974
        |       +---[0.013132ms] java.lang.String:equals() #975
        |       +---[0.003025ms] org.springframework.web.servlet.HandlerExecutionChain:getHandler() #977
        |       +---[0.008095ms] org.springframework.web.servlet.HandlerAdapter:getLastModified() #977
        |       +---[0.006596ms] org.apache.commons.logging.Log:isDebugEnabled() #978
        |       +---[0.018024ms] org.springframework.web.context.request.ServletWebRequest:<init>() #981
        |       +---[0.017869ms] org.springframework.web.context.request.ServletWebRequest:checkNotModified() #981
        |       +---[0.038542ms] org.springframework.web.servlet.HandlerExecutionChain:applyPreHandle() #986
        |       +---[0.00431ms] org.springframework.web.servlet.HandlerExecutionChain:getHandler() #991
        |       +---[4.248493ms] org.springframework.web.servlet.HandlerAdapter:handle() #991
        |       +---[0.014805ms] org.springframework.web.context.request.async.WebAsyncManager:isConcurrentHandlingStarted() #993
        |       +---[1.444994ms] org.springframework.web.servlet.DispatcherServlet:applyDefaultViewName() #997
        |       |   `---[0.067631ms] org.springframework.web.servlet.DispatcherServlet:applyDefaultViewName()
        |       +---[0.012027ms] org.springframework.web.servlet.HandlerExecutionChain:applyPostHandle() #998
        |       +---[0.373997ms] org.springframework.web.servlet.DispatcherServlet:processDispatchResult() #1008
        |       |   `---[0.197004ms] org.springframework.web.servlet.DispatcherServlet:processDispatchResult()
        |       |       +---[0.007074ms] org.apache.commons.logging.Log:isDebugEnabled() #1075
        |       |       +---[0.005467ms] org.springframework.web.context.request.async.WebAsyncUtils:getAsyncManager() #1081
        |       |       +---[0.004054ms] org.springframework.web.context.request.async.WebAsyncManager:isConcurrentHandlingStarted() #1081
        |       |       `---[0.011988ms] org.springframework.web.servlet.HandlerExecutionChain:triggerAfterCompletion() #1087
        |       `---[0.004015ms] org.springframework.web.context.request.async.WebAsyncManager:isConcurrentHandlingStarted() #1018
        +---[0.005055ms] org.springframework.web.context.request.async.WebAsyncUtils:getAsyncManager() #928
        `---[0.003422ms] org.springframework.web.context.request.async.WebAsyncManager:isConcurrentHandlingStarted() #928
[jboss@VM_10_91_centos tmp]$ curl -w "@curl-time.txt" http://127.0.0.1:7744/send
success
              http: 200
               dns: 0.001s
          redirect: 0.000s
      time_connect: 0.001s
   time_appconnect: 0.000s
  time_pretransfer: 0.001s
time_starttransfer: 0.115s
     size_download: 7bytes
    speed_download: 60.000B/s
                  ----------
        time_total: 0.115s

In this call, the calling end takes 115ms, but from the perspective of arthas trace, spring mvc only consumes 18ms. Where is the remaining 97ms?

The problem of spring mvc can be eliminated after local testing. Finally, the only possible problem is tomcat

But I'm not familiar with the source code in tomcat. I don't even know the request entry. It's hard to find the classes that need trace in tomcat...

However, it doesn't matter. There is an artifact Arthas. You can reverse search the call path through the stack command to
org.springframework.web.servlet.DispatcherServlet} as a parameter:

stack outputs the call path where the current method is called

Many times we know that a method is executed, but there are many execution paths for this method, or you don't know where the method is executed. At this time, you need the stack command.

[arthas@24851]$ stack org.springframework.web.servlet.DispatcherServlet *
Press Q or Ctrl+C to abort.
Affect(class-cnt:1 , method-cnt:44) cost in 495 ms.
ts=2019-09-14 21:15:19;thread_name=http-nio-7744-exec-5;id=14;is_daemon=true;priority=5;TCCL=org.springframework.boot.web.embedded.tomcat.TomcatEmbeddedWebappClassLoader@7c136917
    @org.springframework.web.servlet.FrameworkServlet.processRequest()
        at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:866)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:635)
        at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:851)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.springframework.web.filter.HttpPutFormContentFilter.doFilterInternal(HttpPutFormContentFilter.java:109)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:81)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:200)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:496)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342)
        at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:803)
        at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
        at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:790)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1468)
        at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:748)

ts=2019-09-14 21:15:19;thread_name=http-nio-7744-exec-5;id=14;is_daemon=true;priority=5;TCCL=org.springframework.boot.web.embedded.tomcat.TomcatEmbeddedWebappClassLoader@7c136917
    @org.springframework.web.servlet.DispatcherServlet.doService()
        at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:974)
        at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:866)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:635)
        at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:851)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.springframework.web.filter.HttpPutFormContentFilter.doFilterInternal(HttpPutFormContentFilter.java:109)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:81)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:200)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:496)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342)
        at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:803)
        at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
        at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:790)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1468)
        at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:748)

From the stack log, we can intuitively see the call stack of DispatchServlet. For such a long path, which kind of trace should be used (the trace process of the filter in spring mvc is skipped here, and the trace is also used in the actual troubleshooting, but the strange time consumption is not caused by the filter here)?

An experienced old driver can probably guess from his name where to start, that is
org.apache.coyote.http11.Http11Processor.service, from the name, http1 1 processor, which may be a good entry point. Let's trace:

[arthas@24851]$ trace org.apache.coyote.http11.Http11Processor service
Press Q or Ctrl+C to abort.
Affect(class-cnt:1 , method-cnt:1) cost in 269 ms.
`---ts=2019-09-14 21:22:51;thread_name=http-nio-7744-exec-8;id=17;is_daemon=true;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@20ad9418
    `---[131.650285ms] org.apache.coyote.http11.Http11Processor:service()
        +---[0.036851ms] org.apache.coyote.Request:getRequestProcessor() #667
        +---[0.009986ms] org.apache.coyote.RequestInfo:setStage() #668
        +---[0.008928ms] org.apache.coyote.http11.Http11Processor:setSocketWrapper() #671
        +---[0.013236ms] org.apache.coyote.http11.Http11InputBuffer:init() #672
        +---[0.00981ms] org.apache.coyote.http11.Http11OutputBuffer:init() #673
        +---[min=0.00213ms,max=0.007317ms,total=0.009447ms,count=2] org.apache.coyote.http11.Http11Processor:getErrorState() #683
        +---[min=0.002098ms,max=0.008888ms,total=0.010986ms,count=2] org.apache.coyote.ErrorState:isError() #683
        +---[min=0.002448ms,max=0.007149ms,total=0.009597ms,count=2] org.apache.coyote.http11.Http11Processor:isAsync() #683
        +---[min=0.002399ms,max=0.00852ms,total=0.010919ms,count=2] org.apache.tomcat.util.net.AbstractEndpoint:isPaused() #683
        +---[min=0.033587ms,max=0.11832ms,total=0.151907ms,count=2] org.apache.coyote.http11.Http11InputBuffer:parseRequestLine() #687
        +---[0.005384ms] org.apache.tomcat.util.net.AbstractEndpoint:isPaused() #695
        +---[0.007924ms] org.apache.coyote.Request:getMimeHeaders() #702
        +---[0.006744ms] org.apache.tomcat.util.net.AbstractEndpoint:getMaxHeaderCount() #702
        +---[0.012574ms] org.apache.tomcat.util.http.MimeHeaders:setLimit() #702
        +---[0.14319ms] org.apache.coyote.http11.Http11InputBuffer:parseHeaders() #703
        +---[0.003997ms] org.apache.coyote.Request:getMimeHeaders() #743
        +---[0.026561ms] org.apache.tomcat.util.http.MimeHeaders:values() #743
        +---[min=0.002869ms,max=0.01203ms,total=0.014899ms,count=2] java.util.Enumeration:hasMoreElements() #745
        +---[0.070114ms] java.util.Enumeration:nextElement() #746
        +---[0.010921ms] java.lang.String:toLowerCase() #746
        +---[0.008453ms] java.lang.String:contains() #746
        +---[0.002698ms] org.apache.coyote.http11.Http11Processor:getErrorState() #775
        +---[0.00307ms] org.apache.coyote.ErrorState:isError() #775
        +---[0.002708ms] org.apache.coyote.RequestInfo:setStage() #777
        +---[0.171139ms] org.apache.coyote.http11.Http11Processor:prepareRequest() #779
        +---[0.009349ms] org.apache.tomcat.util.net.SocketWrapperBase:decrementKeepAlive() #794
        +---[0.002574ms] org.apache.coyote.http11.Http11Processor:getErrorState() #800
        +---[0.002696ms] org.apache.coyote.ErrorState:isError() #800
        +---[0.002499ms] org.apache.coyote.RequestInfo:setStage() #802
        +---[0.005641ms] org.apache.coyote.http11.Http11Processor:getAdapter() #803
        +---[129.868916ms] org.apache.coyote.Adapter:service() #803
        +---[0.003859ms] org.apache.coyote.http11.Http11Processor:getErrorState() #809
        +---[0.002365ms] org.apache.coyote.ErrorState:isError() #809
        +---[0.003844ms] org.apache.coyote.http11.Http11Processor:isAsync() #809
        +---[0.002382ms] org.apache.coyote.Response:getStatus() #809
        +---[0.002476ms] org.apache.coyote.http11.Http11Processor:statusDropsConnection() #809
        +---[0.002284ms] org.apache.coyote.RequestInfo:setStage() #838
        +---[0.00222ms] org.apache.coyote.http11.Http11Processor:isAsync() #839
        +---[0.037873ms] org.apache.coyote.http11.Http11Processor:endRequest() #843
        +---[0.002188ms] org.apache.coyote.RequestInfo:setStage() #845
        +---[0.002112ms] org.apache.coyote.http11.Http11Processor:getErrorState() #849
        +---[0.002063ms] org.apache.coyote.ErrorState:isError() #849
        +---[0.002504ms] org.apache.coyote.http11.Http11Processor:isAsync() #853
        +---[0.009808ms] org.apache.coyote.Request:updateCounters() #854
        +---[0.002008ms] org.apache.coyote.http11.Http11Processor:getErrorState() #855
        +---[0.002192ms] org.apache.coyote.ErrorState:isIoAllowed() #855
        +---[0.01968ms] org.apache.coyote.http11.Http11InputBuffer:nextRequest() #856
        +---[0.010065ms] org.apache.coyote.http11.Http11OutputBuffer:nextRequest() #857
        +---[0.002576ms] org.apache.coyote.RequestInfo:setStage() #870
        +---[0.016599ms] org.apache.coyote.http11.Http11Processor:processSendfile() #872
        +---[0.008182ms] org.apache.coyote.http11.Http11InputBuffer:getParsingRequestLinePhase() #688
        +---[0.0075ms] org.apache.coyote.http11.Http11Processor:handleIncompleteRequestLineRead() #690
        +---[0.001979ms] org.apache.coyote.RequestInfo:setStage() #875
        +---[0.001981ms] org.apache.coyote.http11.Http11Processor:getErrorState() #877
        +---[0.001934ms] org.apache.coyote.ErrorState:isError() #877
        +---[0.001995ms] org.apache.tomcat.util.net.AbstractEndpoint:isPaused() #877
        +---[0.002403ms] org.apache.coyote.http11.Http11Processor:isAsync() #879
        `---[0.006176ms] org.apache.coyote.http11.Http11Processor:isUpgrade() #881

There is a time-consuming point of 129ms in the log (the time is longer than when arthas is not turned on because of the performance consumption caused by arthas itself, so the production environment should use it carefully). This is the problem point to be found.

If the problem point is found, how to locate the problem, what causes the problem, and how to solve it?

Continue to trace and refine to specific code blocks or content. Due to performance considerations, trace will not display all call paths. If the call path is too deep, only manually go deep into trace. The principle is the method that takes a long time to trace:

[arthas@24851]$ trace org.apache.coyote.Adapter service
Press Q or Ctrl+C to abort.
Affect(class-cnt:1 , method-cnt:1) cost in 608 ms.
`---ts=2019-09-14 21:34:33;thread_name=http-nio-7744-exec-1;id=10;is_daemon=true;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@20ad9418
    `---[81.70999ms] org.apache.catalina.connector.CoyoteAdapter:service()
        +---[0.032546ms] org.apache.coyote.Request:getNote() #302
        +---[0.007148ms] org.apache.coyote.Response:getNote() #303
        +---[0.007475ms] org.apache.catalina.connector.Connector:getXpoweredBy() #324
        +---[0.00447ms] org.apache.coyote.Request:getRequestProcessor() #331
        +---[0.007902ms] java.lang.ThreadLocal:get() #331
        +---[0.006522ms] org.apache.coyote.RequestInfo:setWorkerThreadName() #331
        +---[73.793798ms] org.apache.catalina.connector.CoyoteAdapter:postParseRequest() #336
        +---[0.001536ms] org.apache.catalina.connector.Connector:getService() #339
        +---[0.004469ms] org.apache.catalina.Service:getContainer() #339
        +---[0.007074ms] org.apache.catalina.Engine:getPipeline() #339
        +---[0.004334ms] org.apache.catalina.Pipeline:isAsyncSupported() #339
        +---[0.002466ms] org.apache.catalina.connector.Request:setAsyncSupported() #339
        +---[6.01E-4ms] org.apache.catalina.connector.Connector:getService() #342
        +---[0.001859ms] org.apache.catalina.Service:getContainer() #342
        +---[9.65E-4ms] org.apache.catalina.Engine:getPipeline() #342
        +---[0.005231ms] org.apache.catalina.Pipeline:getFirst() #342
        +---[7.239154ms] org.apache.catalina.Valve:invoke() #342
        +---[0.006904ms] org.apache.catalina.connector.Request:isAsync() #345
        +---[0.00509ms] org.apache.catalina.connector.Request:finishRequest() #372
        +---[0.051461ms] org.apache.catalina.connector.Response:finishResponse() #373
        +---[0.007244ms] java.util.concurrent.atomic.AtomicBoolean:<init>() #379
        +---[0.007314ms] org.apache.coyote.Response:action() #380
        +---[0.004518ms] org.apache.catalina.connector.Request:isAsyncCompleting() #382
        +---[0.001072ms] org.apache.catalina.connector.Request:getContext() #394
        +---[0.007166ms] java.lang.System:currentTimeMillis() #401
        +---[0.004367ms] org.apache.coyote.Request:getStartTime() #401
        +---[0.011483ms] org.apache.catalina.Context:logAccess() #401
        +---[0.0014ms] org.apache.coyote.Request:getRequestProcessor() #406
        +---[min=8.0E-4ms,max=9.22E-4ms,total=0.001722ms,count=2] java.lang.Integer:<init>() #406
        +---[0.001082ms] java.lang.reflect.Method:invoke() #406
        +---[0.001851ms] org.apache.coyote.RequestInfo:setWorkerThreadName() #406
        +---[0.035805ms] org.apache.catalina.connector.Request:recycle() #410
        `---[0.007849ms] org.apache.catalina.connector.Response:recycle() #411

After a boring manual deep trace

[arthas@24851]$ trace org.apache.catalina.webresources.AbstractArchiveResourceSet getArchiveEntries
Press Q or Ctrl+C to abort.
Affect(class-cnt:4 , method-cnt:2) cost in 150 ms.
`---ts=2019-09-14 21:36:26;thread_name=http-nio-7744-exec-3;id=12;is_daemon=true;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@20ad9418
    `---[75.743681ms] org.apache.catalina.webresources.JarWarResourceSet:getArchiveEntries()
        +---[0.025731ms] java.util.HashMap:<init>() #106
        +---[0.097729ms] org.apache.catalina.webresources.JarWarResourceSet:openJarFile() #109
        +---[0.091037ms] java.util.jar.JarFile:getJarEntry() #110
        +---[0.096325ms] java.util.jar.JarFile:getInputStream() #111
        +---[0.451916ms] org.apache.catalina.webresources.TomcatJarInputStream:<init>() #113
        +---[min=0.001175ms,max=0.001176ms,total=0.002351ms,count=2] java.lang.Integer:<init>() #114
        +---[0.00104ms] java.lang.reflect.Method:invoke() #114
        +---[0.045105ms] org.apache.catalina.webresources.TomcatJarInputStream:getNextJarEntry() #114
        +---[min=5.02E-4ms,max=0.008531ms,total=0.028864ms,count=31] java.util.jar.JarEntry:getName() #116
        +---[min=5.39E-4ms,max=0.022805ms,total=0.054647ms,count=31] java.util.HashMap:put() #116
        +---[min=0.004452ms,max=34.479307ms,total=74.206249ms,count=31] org.apache.catalina.webresources.TomcatJarInputStream:getNextJarEntry() #117
        +---[0.018358ms] org.apache.catalina.webresources.TomcatJarInputStream:getManifest() #119
        +---[0.006429ms] org.apache.catalina.webresources.JarWarResourceSet:setManifest() #120
        +---[0.010904ms] org.apache.tomcat.util.compat.JreCompat:isJre9Available() #121
        +---[0.003307ms] org.apache.catalina.webresources.TomcatJarInputStream:getMetaInfEntry() #133
        +---[5.5E-4ms] java.util.jar.JarEntry:getName() #135
        +---[6.42E-4ms] java.util.HashMap:put() #135
        +---[0.001981ms] org.apache.catalina.webresources.TomcatJarInputStream:getManifestEntry() #137
        +---[0.064484ms] org.apache.catalina.webresources.TomcatJarInputStream:close() #141
        +---[0.007961ms] org.apache.catalina.webresources.JarWarResourceSet:closeJarFile() #151
        `---[0.004643ms] java.io.InputStream:close() #155

Found a point worthy of pause:

+---[min=0.004452ms,max=34.479307ms,total=74.206249ms,count=31] org.apache.catalina.webresources.TomcatJarInputStream:getNextJarEntry() #117

This line of code was loaded 31 times, taking a total of 74ms; In terms of name, it should be the time taken for tomcat to load jar packages. Is it the time taken to load 31 jar packages, or the time taken to load some resources in the jar package 31 times?

The comments on the source code of TomcatJarInputStream class write:

The purpose of this sub-class is to obtain references to the JarEntry objects
for META-INF/ and META-INF/MANIFEST.MF that are otherwise swallowed by the
JarInputStream implementation.

The general meaning is to obtain the resources of {META-INF /, META-INF/MANIFEST} in the jar package. This is a subclass, and more functions are in the parent class JarInputStream.

In fact, you can probably guess the problem here. tomcat takes time to load the resources of META-INF / and META-INF/MANIFEST in the jar package. As for why continuous requests do not take time, it should be tomcat's caching mechanism (the source code analysis is introduced below)

Don't worry about locating the problem, try to finally locate the problem details through Arthas, and continue to go deep into trace manually

[arthas@24851]$ trace org.apache.catalina.webresources.TomcatJarInputStream *
Press Q or Ctrl+C to abort.
Affect(class-cnt:1 , method-cnt:4) cost in 44 ms.
`---ts=2019-09-14 21:37:47;thread_name=http-nio-7744-exec-5;id=14;is_daemon=true;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@20ad9418
    `---[0.234952ms] org.apache.catalina.webresources.TomcatJarInputStream:createZipEntry()
        +---[0.039455ms] java.util.jar.JarInputStream:createZipEntry() #43
        `---[0.007827ms] java.lang.String:equals() #44

`---ts=2019-09-14 21:37:47;thread_name=http-nio-7744-exec-5;id=14;is_daemon=true;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@20ad9418
    `---[0.050222ms] org.apache.catalina.webresources.TomcatJarInputStream:createZipEntry()
        +---[0.001889ms] java.util.jar.JarInputStream:createZipEntry() #43
        `---[0.001643ms] java.lang.String:equals() #46
#There are 31 trace logs here, and the rest are deleted

In terms of method name, it still means loading resources. Now that we have reached the jdk source code, let's take a look at the source code of TomcatJarInputStream:

/**
 * Creates a new <code>JarEntry</code> (<code>ZipEntry</code>) for the
 * specified JAR file entry name. The manifest attributes of
 * the specified JAR file entry name will be copied to the new
 * <CODE>JarEntry</CODE>.
 *
 * @param name the name of the JAR/ZIP file entry
 * @return the <code>JarEntry</code> object just created
 */
protected ZipEntry createZipEntry(String name) {
    JarEntry e = new JarEntry(name);
    if (man != null) {
        e.attr = man.getAttributes(name);
    }
    return e;
}

The createZipEntry has a name parameter. From the annotation, it is the jar/zip file name. If you can get the key information such as file name, you can directly locate the problem; Or through Arthas, use the watch command to dynamically monitor the method call data

The watch method performs data observation

So that you can easily observe the call of the specified method. The observed range is: return value, throw exception and enter parameter. View the corresponding variables by writing OGNL expression.

watch the input parameters of this method

[arthas@24851]$ watch  org.apache.catalina.webresources.TomcatJarInputStream createZipEntry "{params[0]}"
Press Q or Ctrl+C to abort.
Affect(class-cnt:1 , method-cnt:1) cost in 27 ms.
ts=2019-09-14 21:51:14; [cost=0.14547ms] result=@ArrayList[
    @String[META-INF/],
]
ts=2019-09-14 21:51:14; [cost=0.048028ms] result=@ArrayList[
    @String[META-INF/MANIFEST.MF],
]
ts=2019-09-14 21:51:14; [cost=0.046071ms] result=@ArrayList[
    @String[META-INF/resources/],
]
ts=2019-09-14 21:51:14; [cost=0.033855ms] result=@ArrayList[
    @String[META-INF/resources/swagger-ui.html],
]
ts=2019-09-14 21:51:14; [cost=0.039138ms] result=@ArrayList[
    @String[META-INF/resources/webjars/],
]
ts=2019-09-14 21:51:14; [cost=0.033701ms] result=@ArrayList[
    @String[META-INF/resources/webjars/springfox-swagger-ui/],
]
ts=2019-09-14 21:51:14; [cost=0.033644ms] result=@ArrayList[
    @String[META-INF/resources/webjars/springfox-swagger-ui/favicon-16x16.png],
]
ts=2019-09-14 21:51:14; [cost=0.033976ms] result=@ArrayList[
    @String[META-INF/resources/webjars/springfox-swagger-ui/springfox.css],
]
ts=2019-09-14 21:51:14; [cost=0.032818ms] result=@ArrayList[
    @String[META-INF/resources/webjars/springfox-swagger-ui/swagger-ui-standalone-preset.js.map],
]
ts=2019-09-14 21:51:14; [cost=0.04651ms] result=@ArrayList[
    @String[META-INF/resources/webjars/springfox-swagger-ui/swagger-ui.css],
]
ts=2019-09-14 21:51:14; [cost=0.034793ms] result=@ArrayList[
    @String[META-INF/resources/webjars/springfox-swagger-ui/swagger-ui.js.map],

Now you can directly see the specific loaded resource name, such a familiar name: swagger UI, a foreign rest interface document tool, and a set of spring mvc integration tool made by domestic developers based on swagger UI. Through annotation, you can automatically generate the interface definition json file required by swagger UI. It is also convenient to use, but it is highly invasive.

After deleting swagger's jar package, the weird 70+ms disappears

<!--pom Delete these two references in the. These two packages are encapsulated by domestic developers, swagger-ui Not provided java spring-mvc Your support package, swagger Just a browser side ui+editor -->
<dependency>
    <groupId>io.springfox</groupId>
    <artifactId>springfox-swagger2</artifactId>
    <version>2.9.2</version>
</dependency>
<dependency>
    <groupId>io.springfox</groupId>
    <artifactId>springfox-swagger-ui</artifactId>
    <version>2.9.2</version>
</dependency>

So why does swagger cause requests to take time, and why does every request for occasional reads load static resources inside swagger?

In fact, this is a bug in Tomcat embedded. Let's introduce the bug in detail

Tomcat embedded bug analysis & solution

The source code analysis process is too long, and it is not the focus of this article, so I won't introduce it. Let's introduce the analysis results directly

By the way, post a core class diagram of tomcat processing requests

Why does each request load static resources in the Jar package

The key is
org.apache.catalina.mapper.Mapper#internalMapWrapper # there is a problem with the way requests are processed in this version, resulting in the verification of static resources every time.

Why is there no problem with continuous requests

Because Tomcat has a cache for the resolution of this static resource, it gives priority to find it from the cache and re resolve it after the cache expires. Specific reference
org.apache.catalina.webresources.Cache. The default expiration time ttl is 5000ms.

Why doesn't it happen locally

In fact, it can't be reproduced after the plug-in is packaged through spring boot. Due to different startup methods, tomcat uses different classes to handle static resources, so it's no problem

How to solve

Upgrade the Tomcat embedded version

The current version of the Bug is:

spring-boot:2.0.2.RELEASE. The built-in Tomcat embedded version is 8.5 thirty-one

Upgrade Tomcat embedded version to} 8.5.0 40 + can solve this problem, and the new version has been fixed

By replacing springboot pom properties

If the project is maven, it inherits the springboot, that is, the parent is configured as springboot, or the import spring boot package in dependency management

<parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.0.2.RELEASE</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>

You can directly override the properties in pom:

<properties>
    <tomcat.version>8.5.40</tomcat.version>
</properties>

Upgrade spring boot version

springboot 2.1. The version of Tomcat embedded in 0.release , has been greater than 8.5 31. Therefore, this problem can be solved by directly upgrading springboot to this version or above.

Data acquisition method
https://docs.qq.com/doc/DQVZmeHpBdWJBeXpi

Topics: Java Nginx Programmer server architecture