无锁算法真的比全锁算法表现更好吗? [英] Do lock-free algorithms really perform better than their lock-full counterparts?

查看:17
本文介绍了无锁算法真的比全锁算法表现更好吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Raymond Chen一直在做一个巨大 系列 无锁 算法.除了 InterlockedXxx 函数的简单案例之外,所有这些函数的流行模式似乎是它们实现自己的锁.当然,没有处理器锁,但在每个 CPU 上反复循环以确保一致性的概念非常类似于自旋锁.作为自旋锁,它们的效率将低于操作系统附带的通用锁,因为它们在等待其他线程时不会产生对它们的量子的控制.因此,每当有人来找我说但我的算法是无锁的"时,我的一般反应是所以"?

我很好奇 - 是否有 基准 可以显示无锁算法比全锁算法具有优势?

解决方案

除了 InterlockedXxx 函数的简单案例之外,看起来像所有这些的普遍模式是他们实施他们的自己的锁.

这里的答案似乎都没有真正触及无锁"之间区别的核心 CAS 循环和互斥锁或自旋锁.

重要的区别在于 无锁 算法保证自行取得进展 - 无需其他线程的帮助.使用锁或自旋锁,任何无法获得锁的可怜线程完全受拥有锁的线程的支配.无法获取锁的可怜线程除了等待(通过忙等待或操作系统辅助睡眠)之外什么都不做.

使用在 CAS 上循环的无锁算法,无论其他竞争线程在做什么,都可以保证每个线程都能取得进展.从本质上讲,每个线程都在控制自己的命运.是的,它仍然可能需要循环很多次,但是它循环的次数受到竞争线程数的限制.在大多数情况下,它不能无限循环.(实际上,由于 LL/SC 等原因,可能会发生活锁 由于错误共享而不断失败的循环) - 但是线程本身可以再次采取措施来处理这个问题 - 它不受另一个持有锁的线程的支配.

至于性能,这取决于.我见过无锁算法的明显例子,即使在高线程争用情况下,它们的锁定算法也完全胜过它们.在运行 Debian 7 的 x86-64 机器上,我比较了 C++ Boost.Lockfree 队列(基于 Michael/Scott 算法)和由 包围的普通旧 std::queue 之间的性能>std::mutex.在高线程争用下,无锁版本的速度几乎是原来的两倍.

那是为什么呢?好吧,无锁算法的性能最终归结为实现细节.算法如何避免ABA?它如何实现安全的内存回收?有很多变种...标记指针、基于 epoch 的回收、RCU/静止状态、危险指针、一般进程范围的垃圾收集等.所有这些策略都有性能影响,有些还限制了您的应用程序的一般方式可以设计.一般来说,根据我的经验,引用计数方法(或标记指针方法)往往表现不佳.但替代方案的实现可能要复杂得多,并且需要更多基于线程本地存储或通用垃圾收集的内存回收基础设施.

Raymond Chen has been doing a huge series on lockfree algorithms. Beyond the simple cases of the InterlockedXxx functions, it seems like the prevailing pattern with all of these is that they implement their own locks. Sure, there are not processor locks, but the concept of looping over and over on each CPU to ensure consistency is very much like a spinlock. And being a spinlock, they are going to be less efficient than the general locks that come with the operating system because they do not yield control of their quanta while waiting for other threads. Therefore, whenever someone comes to me and says "but my algorithm is lock-free", my general response is "so"?

I'm curious -- are there benchmarks available which show lock free algorithms to have an edge over their lock-full counterparts?

解决方案

Beyond the simple cases of the InterlockedXxx functions, it seems like the prevailing pattern with all of these is that they implement their own locks.

None of the answers here really seem to get to the heart of the difference between a "lock-free" CAS loop and a mutex or spin-lock.

The important difference is that lock-free algorithms are guaranteed to make progress on their own - without the assistance of other threads. With a lock or spin lock, any poor thread that can't acquire a lock is entirely at the mercy of the thread that owns the lock. The poor thread that can't acquire the lock can do nothing except wait (either via a busy wait or an OS-assisted sleep).

With lock-free algorithms that loop on a CAS, each thread is guaranteed to make progress regardless of what other contending threads are doing. Each thread is, essentially, in control of its own fate. Yes, it still may have to loop many times, but the number of times it loops is limited by the number of contending threads. It cannot infinitely loop, for the most part. (In practice, it's possible for live lock to occur due to, e.g. an LL/SC loop that keeps failing due to false sharing) - but again measures can be taken by the thread itself to deal with this - it is not at the mercy of another thread holding a lock.

As for performance, it depends. I've seen flagrant examples of lock-free algorithms being totally out-performed by their locking counterparts, even under high-thread contention. On an x86-64 machine running Debian 7, I compared the performance between the C++ Boost.Lockfree queue (based on the Michael/Scott algorithm) and a plain old std::queue surround by an std::mutex. Under high thread contention, the lockfree version was almost twice as slow.

So why is that? Well, the performance of lockfree algorithms ultimately comes down to the implementation details. How does the algorithm avoid ABA? How does it accomplish safe memory reclamation? There are so many variants... tagged pointers, epoch based reclamation, RCU/quiescent state, hazard pointers, general process-wide garbage collection, etc. All these strategies have performance implications, and some also place restrictions on how your application in general can be designed. In general, reference-counting approaches (or tagged pointer approaches) tend to perform poorly, in my experience. But the alternatives can be much more complex to implement, and require a lot more memory-reclamation infrastructure based around thread-local storage or generalized garbage collection.

这篇关于无锁算法真的比全锁算法表现更好吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆