在PerfView中了解BLOCKED_TIME [英] Understanding BLOCKED_TIME in PerfView

查看:81
本文介绍了在PerfView中了解BLOCKED_TIME的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们怀疑我们在运行几个ASP.NET Core API和几个.NET Core控制台的服务器上遇到线程池不足的情况.

We are suspecting that we're experciencing thread pool starvation on a server that is running a couple of ASP.NET Core APIs and a couple of .NET Core consoles.

由于我们怀疑线程池饥饿问题,我运行了其中一台服务器.但是我在分析结果时遇到了一些麻烦.

I ran perfview one one of our servers were we are suspecting problems with thread pool starvation. However I'm having a bit of trouble analyzing the results.

我运行了 PerfView/threadTime collect 大约60秒钟.这就是我得到的结果(我选择了其中一个来查看我们的ASP.NET Core API之一):

I ran PerfView /threadTime collect for about 60 seconds. And this is the result I got (I chose one to look at one of our ASP.NET Core APIs):

通过按名称",我们可以看到在 BLOCKED_TIME 中花了很多时间.如果我双击,则进入以下视图,在其中可以展开节点之一以获取以下视图(被覆盖的部分是我们的API进程的名称):

Looking at "By Name" we can see that there is a lot of time spent in BLOCKED_TIME. If I double click then I'm taken to the following view where I can expand one of the nodes to get the following view (the overwritten part is the name of our API process):

那告诉我什么?我不应该能够看到究竟是什么阻塞吗?看起来问题是很多线程在很短的时间内阻塞了每个线程吗?

What does that tell me? Shouldn't I be able to see what exactly is blocking? And does it look like the problem is that a lot of threads is blocking each one for a small amount of time?

我们还能从中得出其他结论吗?

Are there any other conclusions we can draw from this?

推荐答案

BLOCKED_TIME 通常是指线程根本不执行任何操作的时间段.这可能是I/O期间,其中涉及网络或其他类型的延迟,或者花费在等待锁上的时间(例如在有信号灯的情况下).简而言之,这不一定能告诉您任何信息,因为有完全标准和合理的理由来使线程空闲.但是,阻塞所花费的大量时间可以表明存在潜在的问题.也许您有太多的网络延迟.也许您正在尝试在慢速驱动器上执行过多的文件系统工作.简而言之,它可能指示也可能不指示问题,即使确实指示了问题,也并不能真正告诉您问题是什么.

BLOCKED_TIME generally means a period when the thread wasn't doing anything at all. This could be periods of I/O, where network or other types of latency are involved or time spent waiting on locks such as in situations with semaphores. In short, this doesn't necessarily tell you anything, as there's perfectly standard and reasonable reasons for the thread to be idled. However, a goodish amount of time spent blocked can be an indication of an underlying problem. Perhaps you have too much network latency. Perhaps you're trying to do too much file system work on a slow drive. In short, it may or may not indicate a problem, and even if it does indicate a problem, it doesn't really tell you what the problem is.

通常,如果您遇到线程不足的情况,那么您应该首先查看的是线程池利用率.您是否在所有可能的地方都使用异步?您是否正在做Web应用程序中的大忌,例如使用 Task.Run Task.StartNew 或更糟糕的是 Thread.Start ?所有这些创建的线程都来自同一个线程池,因此成比例地降低了服务器的吞吐量.

In general, if you're experiencing thread starvation, the first thing you should look at is thread pool utilization. Are you using async everywhere you can? Are you doing things that are big no-nos in web apps such as using Task.Run, Task.StartNew or worse, Thread.Start? All those created threads are coming out of the same thread pool, and thus proportionally reducing your server throughput.

有一种非常常见的模式,即通过将长时间运行的作业改组到新线程来调度它们.这就是Web应用程序的死刑.池中的所有线程都可以处理请求,而不是长期运行的作业,因此,应快速有效地处理请求,以便可以在短期内将线程返回到池中以处理其他请求.如果需要后台工作,则需要真正将其后台处理,方法是分流到另一个进程,甚至完全卸载到另一台机器上.

There's an all too common pattern of attempting to schedule long-running jobs by shuffling them to new threads. That's death to a web application. All threads in the pool are there to service requests, not long-running jobs, and as such, requests should be handled quickly and efficiently so that the thread can be returned to the pool in short order to field other requests. If you need to background work, you need to truly background it, by offloading to another process or even a different machine entirely.

所有这些都太短了,也许您获得的负载比服务器通常所能承受的更多.这总是有可能的.也许您需要垂直扩展系统资源(以及带有它的线程池).也许您需要通过在前端带有负载均衡器的情况下复制此服务器来进行水平扩展.假设您在同一台服务器上运行多个不同的事物,那么横向扩展的一种简单方法就是简单地将这些事物分解到自己的计算机上.仅此一项可能会极大地帮助您.但是,垂直或水平缩放是您的最后选择.确保先高效地使用资源,然后再将更多的资源用于效率低下的事情.

Short of all that, maybe you're just getting more load than the server can handle in general. That's always a possibility. Perhaps you need to vertically scale your system resources (and the thread pool with it). Perhaps you need to horizontally scale by replicating this server with a load balancer in front. Given that you're running multiple different things on the same server, an easy way to horizontally scale is to simply divvy out these things to their own machines. That alone would probably help tremendously. However, scaling, either vertically or horizontally, should be your last resort. Make sure you're using resources efficiently first, before throwing more resources at your inefficient things.

这篇关于在PerfView中了解BLOCKED_TIME的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆