应该“结合” “纺丝”线程到某个核心? [英] should I "bind" "spinning" thread to the certain core?
问题描述
我的应用程序包含几个旋转的延迟关键线程,即从不阻止。
这样的线程希望占用一个CPU内核的100%。然而,现代操作系统经常将线程从一个核心传递到另一个核心。例如,使用这个Windows代码:
My application contains several latency-critical threads that "spin", i.e. never blocks. Such thread expected to take 100% of one CPU core. However it seems modern operation systems often transfer threads from one core to another. So, for example, with this Windows code:
void Processor::ConnectionThread()
{
while (work)
{
Iterate();
}
}
我看不到100%任务管理器,整个系统负载为36-40%。
I do not see "100% occupied" core in Task manager, overall system load is 36-40%.
但如果我将其更改为:
void Processor::ConnectionThread()
{
SetThreadAffinityMask(GetCurrentThread(), 2);
while (work)
{
Iterate();
}
}
然后我看到其中一个CPU内核100%占用,也将整个系统负载降低到34-36%。
Then I do see that one of the CPU cores is 100% occupied, also overall system load is reduced to 34-36%.
这是否意味着我应该倾向于 SetThreadAffinityMask
spin线程?如果我改善了延迟添加 SetThreadAffinityMask
在这种情况下?我还应该为自旋线程做什么以提高延迟吗?
Does it mean that I should tend to SetThreadAffinityMask
for "spin" threads? If I improved latency adding SetThreadAffinityMask
in this case? What else should I do for "spin" threads to improve latency?
我正在将我的应用程序移植到Linux,所以这个问题更多关于Linux这很重要。
I'm in the middle of porting my application to Linux, so this question is more about Linux if this matters.
upd 找到此幻灯片,显示绑定忙等待线程到CPU可能有帮助:
upd found this slide which shows that binding busy-waiting thread to CPU may help:
推荐答案
在大多数情况下,如果这是代码中最重要的事情,运行锁定到单个核心的线程会为该线程提供最佳延迟。
Running a thread locked to a single core gives the best latency for that thread in most circumstances if this is the most important thing in your code.
原因(R)是
- 您的代码可能在您的iCache
- 分支预测变量已调整到您的代码
- 您的数据可能已在您的dCache中准备
- TLB指向您的代码和数据。
- your code is likely to be in your iCache
- the branch predictors are tuned to your code
- your data is likely to be ready in your dCache
- the TLB points to your code and data.
除非
- 您的运行SMT系统(例如超线程),在这种情况下,邪恶的双胞胎将帮助你的代码被冲洗掉,你的分支预测被调整到它的代码和它的数据将推出你的dCache,你的TLB受其使用的影响。
- 成本未知,每个缓存缺少成本〜4ns,〜15ns和〜75ns的数据,这快速运行长达几个1000ns。
- 它保存上面提到的每个原因R,仍然存在。
- 如果邪恶双胞胎也只是旋转,成本应低得多。
- Your running a SMT sytem (ex. hyperthreaded) in which case the evil twin will "help" you with by causing your code to be washed out, your branch predictors to be tuned to its code and its data will push your out of the dCache, your TLB is impacted by its use.
- Cost unknown, each cache misses cost ~4ns, ~15ns and ~75ns for data, this quickly runs up to several 1000ns.
- It saves for each reason R mentioned above, that is still there.
- If the evil twin also just spins the costs should be much lower.
- 切换出也可以
- 根本没有磁盘IO。
- 只有异步IO。
这样,如果您需要小于100ns的延迟,以防止应用程序爆炸,您需要防止或减少SMT,中断和任务切换对您的核心的影响。
完美的解决方案是具有静态调度的实时操作系统 / a>。这是一个几乎完美的匹配您的目标,但它的一个新的世界,如果你已经大部分完成服务器和桌面编程。So if you need less than 100ns latency to keep your application from exploding you need to prevent or lessen the impact of SMT, interrupts and task switching on your core. The perfect solution would be an Real time operating system with static scheduling. This is a nearly perfect match for your target, but its a new world if your have mostly done server and desktop programming.
锁定线程的单核心的缺点是:
The disadvantages of locking a thread to a single core are:
- 这将花费一些总吞吐量。
- 将其锁定到核心并且其SMT将减轻此问题,但不会消除它。
- 使用SCHED_FIFO和最高优先级进行计划。 每个添加的核心都会减少此问题。将阻止大多数上下文切换,中断仍然可以像一些系统调用一样导致临时切换。
- 如果你有一个多cpu设置,你可能能够独占CPU的一个 cpuset 。
- Locking it to a core and its SMT will lessen this problem, but not eliminate it. Each added core will lessen the problem.
- setting its priority higher will lessen the problem, but not eliminate it.
- schedule with SCHED_FIFO and highest priority will prevent most context switches, interrupts can still cause temporary switches as does some system calls.
- If you got a multi cpu setup you might be able to take exclusive ownership of one of the CPU's through cpuset. This prevents other applications from using it.
使用 pthread_setschedparam 使用SCHED_FIFO和最高优先级在SU中运行并将其锁定到核心及其邪恶的双胞胎应该保证所有这些的最佳延迟,只有实时操作系统可以消除所有上下文切换。
Using pthread_setschedparam with SCHED_FIFO and highest priority running in SU and locking it to the core and its evil twin should secure the best latency of all of these, only a real time operating system can eliminate all context switches.
有关中断的讨论。
您的Linux可能接受您调用 sched_setscheduler ,使用 SCHED_FIFO ,但是这要求您拥有自己的PID而不仅仅是一个TID,或者您的线程是协作多任务。
这可能不是理想的,因为所有线程只会切换Your Linux might accept that you call sched_setscheduler, using SCHED_FIFO, but this demands you got your own PID not just a TID or that your threads are cooperative multitasking.
This might not ideal as all your threads would only be switches "voluntarily" and thereby removing flexibility for the kernel to schedule it.在 100ns
这篇关于应该“结合” “纺丝”线程到某个核心?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!