1: Background
1. Tell a story
At the beginning of this month, a friend came to me on wx and said that after a period of time, there would be only requests but no responses. The screenshot is as follows:
From my friend's description, it seems that the program is stuck by something. It's relatively easy to solve this problem. Next, I'll analyze it with windbg.
2: Windbg analysis
1. What is the request doing?
Since my friend said that the api has a request but no response, how can I verify whether my friend's words are correct? We all know NET uses HttpContext to represent a request. The implication is that you can grasp the duration attribute under HttpContext. There is one in Netext! The whttp command can help us.
0:000> !whttp HttpContext Thread Time Out Running Status Verb Url 000000563bf803b0 42 00:01:50 00:01:24 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=x-HN 000000563bf84660 -- 00:01:50 Finished 200 GET http://xxx.com:30003/ 000000563c4a0470 51 00:01:50 00:00:12 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx2C 00000056bbf63590 30 00:01:50 00:02:41 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx-B2C 00000056bc82a038 -- 00:01:50 Finished 200 GET http://localhost:30003/ 00000056bc84a3e8 44 00:01:50 00:00:51 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=x 00000056bc8671c8 46 00:01:50 00:00:45 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx-B2C 000000573bf44698 35 00:01:50 00:02:39 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=x 000000573bf483c0 33 00:01:50 00:02:41 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=x-HN 000000573bf97e80 40 00:01:50 00:02:32 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=ZJB2C 000000573c583b08 -- 00:01:50 Finished 200 GET http://localhost:30003/ 000000573c589ec8 -- 00:01:50 Finished 200 GET http://xxx.com:30003/Wms/xxx/xxx/xxx 000000573c760e28 -- 00:01:50 Finished 200 POST http://xxx.com:30003/Wms/xxx/xxx/xxx 000000573c95f990 48 00:01:50 00:00:31 200 POST http://xxx.com:30003/Wms/Common/xxx?xxx=xxx&xxx=x-HN 00000057bbf4f8e8 31 00:01:50 00:02:12 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=x 00000057bc080340 50 00:01:50 00:00:19 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=x 000000583c4aee80 43 00:01:50 00:01:11 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx2B 000000583c4d0c50 53 00:01:50 00:00:01 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx2B 00000058bbf8f1a0 34 00:01:50 00:02:22 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx2B 000000593bfe1758 41 00:01:50 00:01:22 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx2C 000000593c892160 -- 00:01:50 Finished 200 GET http://xxx.com:30003/Wms/xxx/xxx/xxxJob 000000593ca813b0 45 00:01:50 00:00:30 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx-HN 000000593caa45d8 -- 00:01:50 Finished 200 GET http://xxx.com:30003/ 00000059bc1ad808 32 00:01:50 00:01:45 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx-B2C 00000059bc1c3d70 36 00:01:50 00:01:29 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=x 25 HttpContext object(s) found matching criteria
From the Running column, we can see that most requests have reached more than 1 minute, which also verifies the stuck problem mentioned by friends. According to experience, we can take the thread where the largest httpContext in the Running column is located, that is, threads 30 and 33 above to see what they are doing?
2. Explore the longest Running thread
Next, switch to threads 30 and 33 and look at their thread stacks.
0:000> ~30s ntdll!NtWaitForSingleObject+0xa: 00007ffd`b81f024a c3 ret 0:030> !clrstack OS Thread Id: 0x29d0 (30) Child SP IP Call Site 0000005acc3ac590 00007ffdb81f024a [PrestubMethodFrame: 0000005acc3ac590] xxx.xxx.RedisConnectionHelp.get_Instance() 0000005acc3ac850 00007ffd4dd78911 xxx.xxx.RedisCache..ctor(Int32, System.String) 0000005acc3ac8c0 00007ffd4dd78038 xxx.xxx.CacheByRedis.HashGet[[System.__Canon, mscorlib]](System.String, System.String, Int32) 0000005acc3ac968 00007ffdabef1f7c [StubHelperFrame: 0000005acc3ac968] 0000005acc3ac9c0 00007ffd4dd77f18 xxx.xxx.Cache.xxx.GetCacheNotAreaDataEntity[[System.__Canon, mscorlib]](System.String, System.String, System.String) ... 0:030> ~33s ntdll!NtWaitForMultipleObjects+0xa: 00007ffd`b81f07ba c3 ret 0:033> !clrstack OS Thread Id: 0x3ad4 (33) Child SP IP Call Site 0000005accabae90 00007ffdb81f07ba [GCFrame: 0000005accabae90] 0000005accabafb8 00007ffdb81f07ba [HelperMethodFrame_1OBJ: 0000005accabafb8] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object) 0000005accabb0d0 00007ffdaac60d64 System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken) 0000005accabb160 00007ffdaac5b4bb System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken) 0000005accabb1d0 00007ffdab5a01d1 System.Threading.Tasks.Task.InternalWait(Int32, System.Threading.CancellationToken) 0000005accabb2a0 00007ffdab59cfa7 System.Threading.Tasks.Task`1[[System.__Canon, mscorlib]].GetResultxxx(Boolean) 0000005accabb2e0 00007ffd4d8d338f xxx.Config.xxx.Config`1[[System.__Canon, mscorlib]].GetConfig(xxx.Config.Model.ConfigListener, System.Func`2<xxx.Config.Request.GetConfigRequest,System.Threading.Tasks.Task`1<System.String>>) 0000005accabb340 00007ffd4d8d2f40 xxx.Config.xxx.Config`1[[System.__Canon, mscorlib]].get_Item(System.String, System.String) 0000005accabb3c0 00007ffd4dd78f7f xxx.Util.BaseConfig.get_GetRedisConn() 0000005accabb440 00007ffd4dd78e9c xxx.xxx.RedisConnectionHelp.GetConnectionString() 0000005accabb4a0 00007ffd4dd789cb xxx.xxx.RedisConnectionHelp..cctor() 0000005accabb940 00007ffdabef6953 [GCFrame: 0000005accabb940] 0000005accabc5b0 00007ffdabef6953 [PrestubMethodFrame: 0000005accabc5b0] xxx.xxx.RedisConnectionHelp.get_Instance() 0000005accabc870 00007ffd4dd78911 xxx.xxx.RedisCache..ctor(Int32, System.String) 0000005accabc8e0 00007ffd4dd78038 xxx.xxx.CacheByRedis.HashGet[[System.__Canon, mscorlib]](System.String, System.String, Int32) 0000005accabc988 00007ffdabef1f7c [StubHelperFrame: 0000005accabc988] 0000005accabc9e0 00007ffd4dd77f18 xxx.Core.Cache.xxx.GetCacheNotAreaDataEntity[[System.__Canon, mscorlib]](System.String, System.String, System.String) ...
It is not difficult to find that thread 30 is stuck in redisconnectionhelp get_ At instance (), line 33 has entered redisconnectionhelp get_ In the instance () method, finally wait for the Result at GetConfig(). According to experience, thread 30 seems to be waiting for the lock and thread 33 is waiting for the asynchronous Result. The next breakthrough is to explore redisconnectionhelp Code at instance.
3. Find the problem code
Next, use the decompile tool ILSpy to find the problem code.
public static class RedisConnectionHelp { public static ConnectionMultiplexer Instance { get { if (_instance == null) { lock (Locker) { if (_instance == null || !_instance.IsConnected) { _instance = GetManager(); } } } return _instance; } } }
Thread 30 is stuck at the Locker. Next, dig deep into the GetManager() method executed by thread 33. The simplified code is as follows:
public T this[string dataId, string key = ""] { get { try { string config = GetConfig(configListener, new NacosConfigClient(Base.NacosConfiguration).DoGetConfigAsync); return JsonConvert.DeserializeObject<T>(config); } catch (Exception ex) { return default(T); } } } private string GetConfig(ConfigListener listener, Func<GetConfigRequest, Task<string>> action) { var text2 = action(new GetConfigRequest { DataId = listener.DataId, Group = listener.Group, Tenant = text }).Result; return text2; } internal async Task<string> DoGetConfigAsync(GetConfigRequest request) { IRestResponse restResponse = await HttpUtil.Request(currentServerAddr, Method.GET, request.ParamValues(), xxx); return restResponse.Content; }
You can see that the code is stuck on the Result and waits indefinitely. Here I think of the synchronization context. I think his program is NET 4.8 Net MVC, remember that the context should be AspNetSynchronizationContext. For specific deadlock reasons, see my article: A task Result is deadlock. How do you write this code? , there are about four solutions.
- use. ConfigureAwait(false)
- Change to fully asynchronous
- Use Task to wrap another layer.
- Change to full synchronization
3: Summary
In fact, this accident is mainly due to the asynchronous code in the synchronous code There are many articles attacking the deadlock caused by Result in ASP Net core has removed this pit of synchronization context. The suggestion to friends is to change to full synchronization, and the deadlock problem disappears.
Ha ha, I'm so happy for my friends!