WinDbg失控命令输出说明 [英] WinDbg runaway command output explained

查看:72
本文介绍了WinDbg失控命令输出说明的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了生产CPU问题,经过几天的常规活动后,CPU突然开始达到峰值.我已经保存了转储文件并运行!runaway命令来获取耗时最高的线程列表.输出如下:

I have a production CPU issue, after days of regular activity suddenly the CPU starts to peak. I've saved the dump file and run the !runaway command to get the list of highest CPU time consuming threads. the output is below:

用户模式时间
线程时间
21:110 0天10:51:39.781
19:f84 0天10:41:59.671
5:cc4 0天0:53:25.343
48:74 0天0:34:20.140
47:1670 0天0:34:09.812
13:460 0天0:32:57.640
8:14d4 0天0:19:30.546
7:d90 0天0:03:15.000
23:1520 0天0:02:21.984
22:ca0 0天0:02:08.375
24:72c 0天0:02:01.640
29:10ac 0天0:01:58.671
27:1088 0天0:01:44.390

User Mode Time
Thread Time
21:110 0 days 10:51:39.781
19:f84 0 days 10:41:59.671
5:cc4 0 days 0:53:25.343
48:74 0 days 0:34:20.140
47:1670 0 days 0:34:09.812
13:460 0 days 0:32:57.640
8:14d4 0 days 0:19:30.546
7:d90 0 days 0:03:15.000
23:1520 0 days 0:02:21.984
22:ca0 0 days 0:02:08.375
24:72c 0 days 0:02:01.640
29:10ac 0 days 0:01:58.671
27:1088 0 days 0:01:44.390

如您所见,输出显示我有2个线程:21&19,总共消耗了20多个CPU时间,我能够跟踪其中一个线程的调用堆栈,如下所示:

As you can see, the output shows I've 2 threads: 21 & 19, that consumes more than 20 hours of CPU time combined ,I was able to track the callstack of 1 of those threads like so:

〜21秒
!CLRStack

~21s
!CLRStack

此刻输出无关紧要,我们称它为"X调用栈"

the output doesn't matter at the moment, let's call it the "X callstack"

我想要的是有关!runaway命令输出的说明.据我了解,转储文件是应用程序当前状态的快照.所以我的问题是:

What I would like, is an explanation about the !runaway command output. from what I understand, a dump file is a snapshot of the current state of the application. so my questions are:

  1. 当转储过程仅花费几秒钟时,Runaway命令如何显示线程21的10:51小时值?
  2. 这是否意味着我用!CLRStack命令找到的X调用堆栈的特定实例"挂起了10个小时以上?还是21个线程执行他的整个X调用栈执行的总时间?如果是这样,那么21个线程负责X调用栈的如此多次执行似乎很奇怪.据我所知,源是一个Web请求(运行时应为每个调用分配一个随机线程)

我推测这可能会回答这两个问题:

I've a speculation that may answer those 2 questions:

也许windbg通过获取线程调用栈的实际时间并将其除以转储过程的范围来计算时间,因此,例如,如果X调用栈的特定执行花费了1秒,而整个转储过程花费了3秒(33%),而该过程总共运行了24小时,则输出将显示:

Maybe the windbg calculate the time by taking the thread callstack actual time and dividing it by the scope of the dumping process, so if for example the specific execution of the X callstack took 1 second and the whole dumping process took 3 seconds (33%), while the process was running for total of 24 hours the output will show:

8小时(占24小时的33%)

8 hours (33% of 24 hours)

我是对的还是完全错了?

Am I right, or completely got it wrong?

推荐答案

该答案旨在使OP易于理解.并非所有位和字节都正确.

[...]并将其除以转储过程的范围[...]

[...] and dividing it by the scope of the dumping process [...]

这种理解可能是万恶之源:转储进程只会在特定的 point 时间上为您提供进程的状态.转储进程的持续时间为0.0秒,因为在操作期间所有线程都被挂起.(因此,您的流程的相对时间,什么都没有改变,时间静止不动;当然,挂钟时间也会改变)

This understanding is probably the root of all evil: dumping a process only gives you the state of the process at a certain point in time. The duration of dumping the process is 0.0 seconds, since all threads are suspended during the operation. (so, relative time for your process, nothing has changed and time is standing still; of course wall clock time changes)

您正在考虑将进程转储为在更长的时间内对其进行监视,而事实并非如此.转储进程只需要时间,因为它涉及磁盘活动等.

You are thinking of dumping a process as monitoring it over a longer period of time, which is not the case. Dumping a process just takes time because it involves disk activity etc.

因此,没有作用域",因此,您((非常困难)无法使用崩溃转储来衡量性能问题.

So no, there is no "scope" and thus you cannot (it's really hard) measure performance issues with crash dumps.

失控命令如何显示线程21 [...]的10:51小时值

How can the runaway command shows 10:51 hours value for thread 21, [...]

如果您只有一个每秒触发的计时器事件,您的C#程序如何知道该程序运行了多长时间?答案是:它使用变量并增加值.

How can your C# program know how long the program is running if you only have a timer event that fires every second? The answer is: it uses a variable and increases the value.

这大致就是Windows的工作方式.Windows负责线程调度,并且每次重新调度线程时,它都会更新包含线程时间的变量.

That's roughly how Windows does it. Windows is responsible for thread scheduling and each time it re-schedules threads, it updates a variable that contains the thread time.

在编写故障转储时,操作系统早已收集的信息已包含在故障转储中.

When writing the crash dump, the information that was collected by the OS long time ago already, is included in the crash dump.

[...]当转储过程只花了几秒钟吗?

[...] when the dumping process only took a few seconds?

由于故障转储是由WinDbg的一个线程进行的,因此该线程的时间已计入该线程.您将需要调试WinDbg并在WinDbg线程上执行!runaway ,以查看花费了多少CPU时间.可能是个不错的练习, .dbgdbg (调试调试器)命令可能对您来说并不新鲜.除此之外,这种特殊情况并没有真正的帮助.

Since the crash dump is taken by a thread of WinDbg, the time for that is accounted on that thread. You would need to debug WinDbg and do !runaway on a WinDbg thread to see how much CPU time that took. Potentially a nice exercise and the .dbgdbg (debug the debugger) command may be new to you; other than that, this particular case is not really helpful.

这是否意味着我用!CLRStack命令找到的X调用堆栈的特定实例"挂起了10个小时以上?

Does it mean that the specific "instance" of the X callstack I've found with the !CLRStack command is hang more than 10 hours?

不.这意味着在创建故障转储时,该特定方法已执行.不多不少.

No. It means that at the point in time when you created the crash dump, that specific method was executed. Not more, not less.

此信息与!runaway 无关,因为该线程可能已经在很长一段时间内做了完全不同的事情,但此操作刚刚结束.

This information is unrelated to !runaway, because the thread may have been doing something totally different for a long time, but that ended just a moment ago.

或者这是21个线程执行他的整个X调用堆栈执行的总时间?

or it's the total time the 21 thread executed his whole X callstacks executions?

不.故障转储不包含此类详细的性能数据.您需要像JetBrains dotTrace这样的性能分析器才能获取该信息.探查器会非常频繁地查看调用栈,然后汇总相同的调用栈并得出每个调用栈的CPU时间.

No. A crash dump does not contain such detailed performance data. You need a performance profiler like JetBrains dotTrace do get that information. A profiler will look at callstacks very often, then aggregate identical call stacks and derive CPU time per call stack.

这篇关于WinDbg失控命令输出说明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆