了解 PerfView 中的 BLOCKED_TIME [英] Understanding BLOCKED_TIME in PerfView

查看:34
本文介绍了了解 PerfView 中的 BLOCKED_TIME的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们怀疑在运行几个 ASP.NET Core API 和几个 .NET Core 控制台的服务器上遇到线程池饥饿.

We are suspecting that we're experciencing thread pool starvation on a server that is running a couple of ASP.NET Core APIs and a couple of .NET Core consoles.

我在我们的一台服务器上运行了 perfview,因为我们怀疑线程池饥饿问题.但是,我在分析结果时遇到了一些麻烦.

I ran perfview one one of our servers were we are suspecting problems with thread pool starvation. However I'm having a bit of trouble analyzing the results.

我运行了 PerfView/threadTime collect 大约 60 秒.这就是我得到的结果(我选择了一个来查看我们的 ASP.NET Core API 之一):

I ran PerfView /threadTime collect for about 60 seconds. And this is the result I got (I chose one to look at one of our ASP.NET Core APIs):

查看按名称"我们可以看到在BLOCKED_TIME 上花费了很多时间.如果我双击,则会进入以下视图,我可以在其中展开节点之一以获得以下视图(被覆盖的部分是我们的 API 进程的名称):

Looking at "By Name" we can see that there is a lot of time spent in BLOCKED_TIME. If I double click then I'm taken to the following view where I can expand one of the nodes to get the following view (the overwritten part is the name of our API process):

这告诉我什么?我不应该能够看到究竟是什么阻塞?问题是不是很多线程都阻塞了每个线程一小段时间?

What does that tell me? Shouldn't I be able to see what exactly is blocking? And does it look like the problem is that a lot of threads is blocking each one for a small amount of time?

我们还能从中得出任何其他结论吗?

Are there any other conclusions we can draw from this?

推荐答案

BLOCKED_TIME 通常表示线程根本不做任何事情的时期.这可能是 I/O 期间,其中涉及网络或其他类型的延迟,或者在等待锁定(例如在有信号量的情况下)所花费的时间.简而言之,这并不一定会告诉您任何事情,因为线程空闲有完全标准和合理的原因.但是,阻塞所花费的大量时间可能表明存在潜在问题.也许你有太多的网络延迟.也许您正试图在慢速驱动器上执行过多的文件系统工作.简而言之,它可能表示也可能不表示有问题,即使确实表示有问题,也不能真正告诉您问题是什么.

BLOCKED_TIME generally means a period when the thread wasn't doing anything at all. This could be periods of I/O, where network or other types of latency are involved or time spent waiting on locks such as in situations with semaphores. In short, this doesn't necessarily tell you anything, as there's perfectly standard and reasonable reasons for the thread to be idled. However, a goodish amount of time spent blocked can be an indication of an underlying problem. Perhaps you have too much network latency. Perhaps you're trying to do too much file system work on a slow drive. In short, it may or may not indicate a problem, and even if it does indicate a problem, it doesn't really tell you what the problem is.

一般来说,如果您遇到线程饥饿,您应该首先考虑的是线程池利用率.你在任何地方都在使用异步吗?您是否在做 Web 应用程序中的大禁忌,例如使用 Task.RunTask.StartNew 或更糟的是,Thread.Start?所有这些创建的线程都来自同一个线程池,因此会成比例地降低您的服务器吞吐量.

In general, if you're experiencing thread starvation, the first thing you should look at is thread pool utilization. Are you using async everywhere you can? Are you doing things that are big no-nos in web apps such as using Task.Run, Task.StartNew or worse, Thread.Start? All those created threads are coming out of the same thread pool, and thus proportionally reducing your server throughput.

尝试通过将长时间运行的作业改组到新线程来安排它们的模式太普遍了.这对 Web 应用程序来说是致命的.池中的所有线程都用于服务请求,而不是长时间运行的作业,因此,应快速有效地处理请求,以便线程可以在短时间内返回到池中以处理其他请求.如果您需要后台工作,您需要真正将其后台运行,方法是将其卸载到另一个进程甚至完全不同的机器上.

There's an all too common pattern of attempting to schedule long-running jobs by shuffling them to new threads. That's death to a web application. All threads in the pool are there to service requests, not long-running jobs, and as such, requests should be handled quickly and efficiently so that the thread can be returned to the pool in short order to field other requests. If you need to background work, you need to truly background it, by offloading to another process or even a different machine entirely.

总之,也许您得到的负载超出了服务器的一般处理能力.那总是有可能的.也许您需要垂直扩展您的系统资源(以及带有它的线程池).也许您需要通过在前面使用负载平衡器复制此服务器来进行水平扩展.鉴于您在同一台服务器上运行多个不同的东西,横向扩展的一种简单方法是将这些东西简单地分配到他们自己的机器上.仅此一项就可能有很大帮助.但是,垂直或水平缩放应该是最后的手段.确保您首先有效地使用资源,然后再将更多资源投入到低效的事情上.

Short of all that, maybe you're just getting more load than the server can handle in general. That's always a possibility. Perhaps you need to vertically scale your system resources (and the thread pool with it). Perhaps you need to horizontally scale by replicating this server with a load balancer in front. Given that you're running multiple different things on the same server, an easy way to horizontally scale is to simply divvy out these things to their own machines. That alone would probably help tremendously. However, scaling, either vertically or horizontally, should be your last resort. Make sure you're using resources efficiently first, before throwing more resources at your inefficient things.

这篇关于了解 PerfView 中的 BLOCKED_TIME的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆