为什么std :: mutex在OSX上这么慢? [英] Why is std::mutex so slow on OSX?

查看:113
本文介绍了为什么std :: mutex在OSX上这么慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下基准: https://gist.github.com/leifwalsh/10010580

基本上它旋转了 k 线程,然后每个线程执行大约16万/ k lock / increment / unlock cycles,使用自旋锁和 std :: mutex 。在OSX上, std :: mutex 在竞争时比自旋锁慢得多,而在Linux上它有竞争力或有点快。

Essentially it spins up k threads and then each thread does about 16 million / k lock/increment/unlock cycles, using a spinlock and a std::mutex. On OSX, the std::mutex is devastatingly slower than the spinlock when contended, whereas on Linux it's competitive or a bit faster.

OSX:

spinlock 1:     334ms
spinlock 2:     3537ms
spinlock 3:     4815ms
spinlock 4:     5653ms
std::mutex 1:   813ms
std::mutex 2:   38464ms
std::mutex 3:   44254ms
std::mutex 4:   47418ms

Linux:

spinlock 1:     305ms
spinlock 2:     1590ms
spinlock 3:     1820ms
spinlock 4:     2300ms
std::mutex 1:   377ms
std::mutex 2:   1124ms
std::mutex 3:   1739ms
std::mutex 4:   2668ms

处理器是不同的,但不是不同的(OSX是Intel(R)Core i7-2677M CPU @ 1.80GHz,Linux是Intel(R) Core(TM)i5-2500K CPU @ 3.30GHz),这似乎是一个库或内核的问题。任何人都知道缓慢的来源?

The processors are different, but not that different (OSX is Intel(R) Core(TM) i7-2677M CPU @ 1.80GHz, Linux is Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz), this seems like a library or kernel problem. Anyone know the source of the slowness?

为了澄清我的问题,我理解有不同的互斥实现优化不同的东西,这不是一个问题,它预期。这个问题是:在实现中导致这个的实际差异是什么?

To clarify my question, I understand that "there are different mutex implementations that optimize for different things and this isn't a problem, it's expected". This question is: what are the actual differences in implementation that cause this? Or, if it's a hardware issue (maybe the cache is just a lot slower on the macbook), that's acceptable too.

推荐答案

如果是硬件问题(也许缓存在Macbook上慢得多)你只是测量图书馆的公平交易吞吐量的选择。基准是非常人为的,并且惩罚任何提供任何公平的尝试。

You're just measuring the library's choice of trading off throughput for fairness. The benchmark is heavily artificial and penalizes any attempt to provide any fairness at all.

实现可以做两件事。它可以让同一个线程在一行中获得两次互斥,或者它可以改变哪个线程获得互斥。这个基准严重地惩罚了线程的变化,因为上下文切换需要时间,并且因为从缓存到缓存的乒乓互斥和 val 需要时间。

The implementation can do two things. It can let the same thread get the mutex twice in a row, or it can change which thread gets the mutex. This benchmark heavily penalizes a change in threads because the context switch takes time and because ping-ponging the mutex and val from cache to cache takes time.

很可能,这只是显示了实现必须做出的不同权衡。它大量奖励实现,喜欢将互斥量返回到最后持有它的线程。基准甚至奖励浪费CPU做的那些实现!它甚至奖励浪费CPU的实现,以避免上下文切换,即使有其他有用的工作,CPU可以做!它也不会惩罚可能减缓其他不相关线程的核心间流量的实现。

Most likely, this is just showing the different trade-offs that implementations have to make. It heavily rewards implementations that prefer to give the mutex back to the thread that last held it. The benchmark even rewards implementations that waste CPU to do that! It even rewards implementations that waste CPU to avoid context switches, even when there's other useful work the CPU could do! It also doesn't penalize the implementation for inter-core traffic which can slow down other unrelated threads.

此外,实现互斥体的人通常假定在无争议情况下的性能比在竞争情况下的性能更重要。在这些情况之间有许多折衷,例如假设可能有线程等待或专门检查是否存在。基准测试只(或至少,几乎只有)通常被推翻的情况下推定更常见的情况。

Also, people who implement mutexes generally presume that performance in the uncontended case is more important than performance in the contended case. There are numerous tradeoffs you can make between these cases, such as presuming that there might be a thread waiting or specifically checking if there is. The benchmark tests only (or at least, almost only) the case that is typically traded off in favor of the case presumed more common.

直接,这是一个毫无根据的基准无法识别问题。

Bluntly, this is a senseless benchmark that is incapable of identifying a problem.

几乎可以肯定的是,Linux实现是一个spinlock / futex混合体,而OSX实现是常规的,相当于锁定内核对象。 Linux实现的spinlock部分倾向于允许刚刚释放互斥体的同一个线程再次锁定它,这样你的基准测试大大的奖励。

The specific explanation is almost certainly that the Linux implementation is a spinlock/futex hybrid while the OSX implementation is conventional, equivalent to locking a kernel object. The spinlock portion of the Linux implementation favors allowing the same thread that just released the mutex to lock it again, which your benchmark heavily rewards.

这篇关于为什么std :: mutex在OSX上这么慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆