无锁的多线程是真实的穿线专家 [英] Lock-free multi-threading is for real threading experts

查看:200
本文介绍了无锁的多线程是真实的穿线专家的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我读通过回答说的Jon飞碟双向给了一个问题,他在文中提到了这一点:

I was reading through an answer that Jon Skeet gave to a question and in it he mentioned this:

就我而言,无锁多线程是真实的线程的专家,其中我一个也没有。

As far as I'm concerned, lock-free multi-threading is for real threading experts, of which I'm not one.

它不是我听说过这一点,但我觉得很少有人谈论你如何真正做到这一点,如果你有兴趣学习如何编写无锁多线程code中的第一次。

Its not the first time that I have heard this, but I find very few people talking about how you actually do it if you are interested in learning how to write lock-free multi-threading code.

所以我的问题是,除了学习你在哪里开始努力学习,专门写无锁多线程code,什么是一些很好的资源,你可以了解线程,等等。

So my question is besides learning all you can about threading, etc where do you start trying to learn to specifically write lock-free multi-threading code and what are some good resources.

干杯

推荐答案

目前的无锁的实施遵循相同的模式大部分的时间:

Current "lock-free" implementations follow the same pattern most of the time:

  • 阅读一些国​​家,使一个副本的*
  • 修改副本的*
  • 请联锁操作
  • 如果失败
  • 重试
  • read some state and make a copy of it*
  • modify copy*
  • do an interlocked operation
  • retry if it fails

<子>的(*可选:取决于数据结构/算法)

最后一位是极其相似自旋锁。事实上,它是一个基本的<一href="http://stackoverflow.com/questions/2368164/how-is-thread-synchronization-implemented-at-the-assembly-language-level/2368183#2368183">spinlock. :)
我与@nobugz同意这一点:联锁操作都必须在无锁的多线程使用成本<一href="http://stackoverflow.com/questions/2494057/can-someone-provide-an-easy-explanation-of-how-full-fences-are-implemented-in/2495075#2495075">dominated由高速缓存和内存一致性任务,它必须执行。

The last bit is eerily similar to a spinlock. In fact, it is a basic spinlock. :)
I agree with @nobugz on this: the cost of the interlocked operations used in lock-free multi-threading is dominated by the cache and memory-coherency tasks it must carry out.

你的数据结构是无锁,然而得到的好处是,你的锁非常细粒度的。这降低了两个并发线程访问相同的锁(存储位置)的可能性。

What you gain however with a data structure that is "lock-free" is that your "locks" are very fine grained. This decreases the chance that two concurrent threads access the same "lock" (memory location).

诀窍的大部分时间是你没有专用锁 - 而不是你把如在阵列或在一个链表中作为自旋锁的所有元素的所有节点。你读,修改和尝试,如果有你上次读没有更新来更新。如果有,你重试。
这使得你的锁定(哦,对不起,无锁定:)非常细粒度的,而不引入额外的内存或资源需求。
使之更细粒度减小等待的概率。使得它作为细粒度尽可能不引入额外的资源需求听起来不错,不是吗?

The trick most of the time is that you do not have dedicated locks - instead you treat e.g. all elements in an array or all nodes in a linked list as a "spin-lock". You read, modify and try to update if there was no update since your last read. If there was, you retry.
This makes your "locking" (oh, sorry, non-locking :) very fine grained, without introducing additional memory or resource requirements.
Making it more fine-grained decreases the probability of waits. Making it as fine-grained as possible without introducing additional resource requirements sounds great, doesn't it?

大多数的乐趣却可以来自<一个href="http://stackoverflow.com/questions/2494057/can-someone-provide-an-easy-explanation-of-how-full-fences-are-implemented-in/2495075#2495075">ensuring正确加载/存储顺序。
相反,一个人的直觉,CPU都可以自由地重新排序内存读/写 - 他们是非常聪明的,顺便说一句:你将有一个很难从一个单独的线程观察这一点。但是您遇到的问题,当你开始做多线程在多个内核。你的直觉会分解:仅仅因为一个指令早在code,这并不意味着它实际上将更早发生。 CPU可以处理的指令乱序:他们特别喜欢这样做是为了说明内存存取,隐藏主内存延迟并更好地利用自己的高速缓存

Most of the fun however can come from ensuring correct load/store ordering.
Contrary to one's intuitions, CPUs are free to reorder memory reads/writes - they are very smart, by the way: you will have a hard time observing this from a single thread. You will, however run into issues when you start to do multi-threading on multiple cores. Your intuitions will break down: just because an instruction is earlier in your code, it does not mean that it will actually happen earlier. CPUs can process instructions out of order: and they especially like to do this to instructions with memory accesses, to hide main memory latency and make better use of their cache.

现在,这肯定对直觉的code的顺序不流自上而下的,而不是它运行,如果没有顺序可言 - 而且可以被称为魔鬼的乐园。我相信这是不可能给出一个确切的答案,什么加载/存储重新排序将发生。相反,人们总是说在的梅斯的条款和不妨的和prepare了最坏的打算。 呵呵,CPU的可能的重新排序这个读来是写之前,所以最好把记忆障碍就在这里,在这一点上。

Now, it is sure against intuition that a sequence of code does not flow "top-down", instead it runs as if there was no sequence at all - and may be called "devil's playground". I believe it is infeasible to give an exact answer as to what load/store re-orderings will take place. Instead, one always speaks in terms of mays and mights and cans and prepare for the worst. "Oh, the CPU might reorder this read to come before that write, so it is best to put a memory barrier right here, on this spot."

事务是由事实复杂,即使是这些的梅斯不妨的可以在CPU架构不同。它的可能的是这种情况,例如,一些被的保证不会发生的一个体系结构的可能会发生的另一个上。

Matters are complicated by the fact that even these mays and mights can differ across CPU architectures. It might be the case, for example, that something that is guaranteed to not happen in one architecture might happen on another.

要获得无锁多线程的权利,你必须了解的内存模型。
,获取存储模型和保证正确的是不平凡不过就证明了这个故事,因此英特尔和AMD做了一些更正文档对 MFENCE 引起一些轰动式中JVM开发。事实证明,开发商依靠从一开始的文件就没有那么precise摆在首位。

To get "lock-free" multi-threading right, you have to understand memory models.
Getting the memory model and guarantees correct is not trivial however, as demonstrated by this story, whereby Intel and AMD made some corrections to the documentation of MFENCE causing some stir-up among JVM developers. As it turned out, the documentation that developers relied on from the beginning was not so precise in the first place.

锁在.NET中导致内隐记忆障碍,所以你是安全的使用它们(大部分时间,就是...例如见本的乔·达菲 - 布拉德·艾布拉姆斯 - 万斯莫里森盖世的延迟初始化,锁具,挥发物及记忆力障碍:)(请务必遵循。该网页上的链接。)

Locks in .NET result in an implicit memory barrier, so you are safe using them (most of the time, that is... see for example this Joe Duffy - Brad Abrams - Vance Morrison greatness on lazy initialization, locks, volatiles and memory barriers. :) (Be sure to follow the links on that page.)

作为额外的奖励,你会得到介绍一个支线任务的.NET内存模型。 :)

As an added bonus, you will get introduced to the .NET memory model on a side quest. :)

还有一个过时的歌曲,但戈尔迪的万斯·莫里森:什么每开发必须了解多线程应用程序

There is also an "oldie but goldie" from Vance Morrison: What Every Dev Must Know About Multithreaded Apps.

...,当然,作为<一href="http://stackoverflow.com/questions/2528969/lock-free-multi-threading-is-for-real-threading-experts/2529773#2529773">@Eric提及,乔·达菲是一个明确的阅读关于这个问题的。

...and of course, as @Eric mentioned, Joe Duffy is a definitive read on the subject.

一个很好的STM可以得到接近细粒度锁,因为它得到,并可能会提供一个性能接近或等同于一个手工制作的实现。 其中之一是 STM.NET 从的DevLabs项目MS的

A good STM can get as close to fine-grained locking as it gets and will probably provide a performance that is close to or on par with a hand-made implementation. One of them is STM.NET from the DevLabs projects of MS.

如果你不是一个.NET只狂热者, Doug Lea的那样在JSR-166的一些伟大的工作
崖点击有一个有趣的作为在哈希表,不依赖于锁定条带化 - 为Java和.NET并行​​哈希表做 - 而且似乎很好地扩展到750的CPU

If you are not a .NET-only zealot, Doug Lea did some great work in JSR-166.
Cliff Click has an interesting take on hash tables that does not rely on lock-striping - as the Java and .NET concurrent hash tables do - and seem to scale well to 750 CPUs.

如果你不害怕冒险进入Linux的领土,下面的文章提供了更深入地了解当前的内存架构和缓存线共享如何破坏性能的内部:的什么每个程序员应该知道内存

If you are not afraid to venture into Linux territory, the following article provides more insight into the internals of current memory architectures and how cache-line sharing can destroy performance: What every programmer should know about memory.

@Ben在关于MPI许多意见:我真诚地认为,MPI可以发光部分地区。一种基于MPI的解决方案可以更容易推理,更容易实现和更不容易出错不是一个半生不熟的锁定实施,试图要聪明。 (然而 - 主观上 - 也适用于一个STM基础的解决方案。)我也敢打赌,它是光年更容易正确地写在如一个体面的分布式的应用二郎,因为许多成功的例子说明。

@Ben made many comments about MPI: I sincerely agree that MPI may shine in some areas. An MPI based solution can be easier to reason about, easier to implement and less error-prone than a half-baked locking implementation that tries to be smart. (It is however - subjectively - also true for an STM based solution.) I would also bet that it is light-years easier to correctly write a decent distributed application in e.g. Erlang, as many successful examples suggest.

MPI,但是有其本身的费用和自己的麻烦时,它被上运行的单,多核系统的。例如。在Erlang中,有围绕进程调度和消息队列,从而同步需要解决的问题。
此外,在其核心,MPI系统通常实行的一种协作
N:M计划中的轻量级进程。例如,这意味着有轻量级进程之间的必然上下文切换。这是事实,这是不是一个经典的上下文切换,但大多是用户的操作空间,它可快速 - 然而,我真诚地相信,它可以在 20-200次联锁操作需要。用户模式的上下文切换是<一个href="http://www.intel.com/technology/itj/2007/v11i3/4-environment/figures/paper4_table1_lg.gif">certainly慢即使在英特尔MCRT库。 N:M调度轻量进程并不是什么新鲜事。轻量级进程在那里在Solaris中很长一段时间。他们被抛弃了。有在NT纤维。他们大多是遗物了。有激活在NetBSD中。他们被抛弃了。 Linux的有它自己的作为在N的主题:M线程。这似乎是有些现在已经死了。
不时,有新的竞争者,例如: MCRT英特尔< /一>,或最近用户模式调度与< A HREF =htt​​p://msdn.microsoft.com/en-us/library/dd998048%28VS.100%29.aspx> ConCRT 来自微软。
在最底层,他们做的事情的N:M MPI调度程序。二郎山 - 或任何MPI系统 - ,可能会极大地在SMP系统通过利用新的 UMS 的。

MPI, however has its own costs and its own troubles when it is being run on a single, multi-core system. E.g. in Erlang, there are issues to be solved around the synchronization of process scheduling and message queues.
Also, at their core, MPI systems usually implement a kind of cooperative N:M scheduling for "lightweight processes". This for example means that there is an inevitable context switch between lightweight processes. It is true that it is not a "classic context switch" but mostly a user space operation and it can be made fast - however I sincerely doubt that it can be brought under the 20-200 cycles an interlocked operation takes. User-mode context switching is certainly slower even in the the Intel McRT library. N:M scheduling with light-weight processes is not new. LWPs were there in Solaris for a long time. They were abandoned. There were fibers in NT. They are mostly a relic now. There were "activations" in NetBSD. They were abandoned. Linux had its own take on the subject of N:M threading. It seems to be somewhat dead by now.
From time to time, there are new contenders: for example McRT from Intel, or most recently User-Mode Scheduling together with ConCRT from Microsoft.
At the lowest level, they do what an N:M MPI scheduler does. Erlang - or any MPI system -, might benefit greatly on SMP systems by exploiting the new UMS.

我猜的任择议定书的问题,是不是的优点和主观论点赞成/反对任何解决方案,但如果我要回答这个问题,我想这取决于任务:建设水平低,高性能的基本数据结构运行在的单个系统的用的多核心的,无论是低锁/无锁技术或STM将产生在性能方面最好的结果,很可能击败一个MPI解决方案,随时性能的角度来看,即使上述皱纹得到妥善解决,例如在二郎。
对于构建在一个系统上运行任何东西中等更加复杂,我或许会选择经典的粗粒度的锁定,或者如果性能是极大的关注,一个STM。
对于构建分布式系统,MPI系统可能会做出的必然选择。
请注意,有 MPI实现获取的 .NET以及(虽然他们似乎没有为活动)。

I guess the OP's question is not about the merits of and subjective arguments for/against any solution, but if I had to answer that, I guess it depends on the task: for building low level, high performance basic data structures that run on a single system with many cores, either low-lock/"lock-free" techniques or an STM will yield the best results in terms of performance and would probably beat an MPI solution any time performance-wise, even if the above wrinkles are ironed out e.g. in Erlang.
For building anything moderately more complex that runs on a single system, I would perhaps choose classic coarse-grained locking or if performance is of great concern, an STM.
For building a distributed system, an MPI system would probably make a natural choice.
Note that there are MPI implementations for .NET as well (though they seem to be not as active).

这篇关于无锁的多线程是真实的穿线专家的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆