如何处理ZeroMQ + Ruby中的线程问题? [英] How to handle a Thread Issue in ZeroMQ + Ruby?

查看:89
本文介绍了如何处理ZeroMQ + Ruby中的线程问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

偶然阅读有关线程安全的 ZeroMQ常见问题解答.

我的多线程程序在ZeroMQ库中的怪异位置不断崩溃.我在做什么错了?

ZeroMQ套接字是不是线程安全的. 《指南》对此进行了详细介绍.

简短的版本是套接字不应该在线程之间共享.我们建议为每个线程创建一个专用套接字.

对于无法使用每个线程专用套接字的情况,可以且仅当每个线程在访问套接字之前执行完整的内存屏障时,才可以共享套接字.大多数语言都支持Mutex或Spinlock,它们将代表您执行完整的内存屏障.

我的多线程程序不断在ZeroMQ库中的怪异位置崩溃.
我究竟做错了什么?

以下是我的以下代码:

Celluloid::ZMQ.init
module Scp
    module DataStore
    class DataSocket
        include Celluloid::ZMQ 
            def pull_socket(socket)
                @read_socket = Socket::Pull.new.tap do |read_socket|
                    ## IPC socket
                    read_socket.connect(socket)
                end
            end

            def push_socket(socket)
                @write_socket = Socket::Push.new.tap do |write_socket|
                    ## IPC socket
                    write_socket.connect(socket)
                end
            end

            def run
                pull_socket and push_socket and loopify!
            end

            def loopify!
                loop {
                   async.evaluate_response(read_socket.read_multipart)
                }
            end

            def evaluate_response(data)
                return_response(message_id,routing,Parser.parser(data))
            end

            def return_response(message_id,routing,object)
                data = object.to_response
                write_socket.send([message_id,routing,data])
            end
        end
    end
end  

DataSocket.new.run 

现在,有些事情我还不清楚:

1)假设async(每次)产生一个新的Thread,并且write_socket在所有线程之间共享,并且ZeroMQ表示它们的套接字不是线程安全的.我当然看到write_socket遇到线程安全问题.
(顺便说一句,到目前为止,还没有在所有的端到端测试中都遇到这个问题.)

问题1 :对此我的理解正确吗?

为解决此问题,ZeroMQ要求我们使用Mutex信号量来实现这一目标.

哪个会导致问题2

2)上下文切换.

鉴于线程化应用程序可以随时进行上下文切换. 查看ffi-rzmq代码Celluloid::ZMQ .send() 内部调用解决方案

没有人应该冒险将应用程序置于健壮状态

请原谅这个故事,以便进行较长的阅读,但是作者的毕生经历表明, 原因 的原因比(可能令人怀疑或看起来很神秘或根本原因不明的)尝试通过实验找到如何

首字母

ZeroMQ几十年来一直被推广为零共享(零阻塞,(几乎)零延迟和更多设计最大化).关于优缺点的最佳阅读场所是Pieter HINTJENS的书不仅是神话般的已连接代码,第1卷",还有真正的社交域哲学的高级设计和工程,最近的API文档都引入并宣传了一些恕我直言的功能,这些功能与这些基石之间有着轻松的联系分布式计算的原理,没有那么大的零共享口哨声.话虽如此,我仍然是一个零共享"的家伙,因此请从这个角度来看这篇文章的其余部分.

答案1:
不,先生. -或更好-是的,不,先生.

ZeroMQ不要求使用Mutex/Semaphore屏障.这与ZeroMQ设计准则背道而驰.

是的,最近的API更改开始提到 (在某些附加条件下) 可能会开始使用共享套接字...,并附带(许多)附加措施.因此暗示就被颠倒了.如果一个想",那也将采取所有其他步骤和措施(并支付所有最初隐藏的设计和实施费用,以允许"共享玩具生存(希望)与其他主体(不必要)战斗.在不可控制的分布式系统环境中-突然之间也冒着失败的风险(出于很多明智的原因,最初的ZeroMQ零共享传福音不是这种情况)-因此,用户决定走哪条路.公平.).

声音和声音健壮的设计IMHO仍然按照最初的ZeroMQ API&传福音,零共享是一项原则.

答案2:
按设计,ZeroMQ数据流排序始终存在主要不确定性,ZeroMQ设计准则之一使设计人员不要依赖不受支持的消息排序假设,并且其他许多(有例外).可以肯定的是,分发到ZeroMQ基础结构中的任何消息要么作为完整消息传递,要么根本不传递.因此,可以肯定的是,交货时不会出现零碎的沉船.有关更多详细信息,请阅读以下内容.


ThreadId 不能证明任何内容(除非使用了inproc传输类)

鉴于ZeroMQ数据泵送引擎的内部设计,a的实例化
zmq.Context( number_of_IO_threads ) 决定生成多少个线程来处理将来的数据流.它可以在{0,1:default,2,..}的任何地方,直到几乎耗尽了内核固定的最大线程数.如果inproc:// transport-class实际上是直接内存区域映射的数据流处理,则值0给出了一个不浪费资源的合理选择(实际上从不将流量ang钉扎到登陆垫上) 不需要任何线程.
紧接着, <aSocket>.setsockopt( zmq.AFFINITY, <anIoThreadEnumID#> ) 允许对与数据相关的IO-液压"进行微调,以便对线程负载进行优先级,负载平衡和性能调整,并将其调整到枚举池中zmq.Context()-实例的IO线程的数量,并从上面列出的设计&数据流操作方面.


基石元素是Context()的实例,
不是Socket()的实例

一旦实例化并配置了Context()的实例(请参见上面的原因和方法),它几乎可以自由共享(如果设计无法抗拒共享或需要避免使用完善的分布式计算基础架构的设置).

换句话说,大脑始终位于zmq.Context()的实例中-所有与套接字相关的dFSA引擎都在此处进行设置/配置/操作(是的,即使语法是<aSocket>.setsockopt(...)这种作用是在 The Brain (在大脑内部)中实现的-在相应的zmq.Context中-而不是在从A到B的某些导线中.

最好不要共享<aSocket> (即使API-4.2.2 +承诺可以)

到目前为止,可能已经看到了很多代码片段,其中ZeroMQ Context及其套接字被实例化并迅速放置,连续只提供了几个SLOC-,但是-这并不意味着,这种做法是明智的选择,也可以通过任何其他需要进行调整,而不仅仅是一个非常学术的例子(由于图书出版商的政策,之所以需要尽可能少地使用SLOC进行印刷,是必须这样做的.)

即使在这种情况下,也应该存在关于zmq.Context基础设施设置/拆除的确巨大成本的合理警告,因此,为了避免泛化,使用此类代码的任何复制/粘贴副本的数量应少一些仅出于这种说明性目的.

只需想象一下对于任何单个Context实例都需要进行的实际设置-准备好各个dFSA引擎的池,保持它们各自的配置设置以及所有与传输相关的套接字端点池-类特定的硬件+外部O/S服务处理程序,循环事件扫描程序,缓冲区内存池分配+其动态分配器等,这些都需要时间和O/S资源,因此处理这些(自然),以明智地降低成本,并在不影响性能的情况下谨慎考虑调整后的间接费用.

如果仍然不确定为什么要提起这个问题,试想一下是否有人在发送数据包后坚持要拆除所有LAN电缆,并且需要等到新的电缆安装好之后再发送数据下一个数据包出现.希望现在可以更好地理解这种合理实例化"的观点,并希望有一个共享(如果有的话)zmq.Context()-实例的论点,而无需为尝试共享ZeroMQ套接字实例(即使新近成为(几乎)线程安全的本质).

如果将ZeroMQ哲学用作高性能分布式计算基础架构的高级设计福音,那么它是健壮的.仅就一个(次要)方面进行调整通常不会调整所有工作和成本,因为就如何设计安全和高性能系统的全局观点而言,结果不会有任何改善 (甚至是绝对可共享的无风险(如果可能的话))套接字实例也不会改变这一点,而声音设计,简洁代码以及可合理实现的测试能力和调试的所有好处都将得到解决.如果只更改了一个细节-那么,宁可将另一根导线从现有的大脑中拉到这样的新线程上,或者为新线程配备自己的大脑,它将在本地处理其资源并允许其连接自己的导线回到所有其他大脑-必要时与之交流-在分布式系统中.)

如果仍然有疑问,请试想一下,如果您的国家奥林匹克曲棍球队在比赛中只共用一根曲棍球棒,将会发生什么情况.或者,如果您的家乡中的所有邻居都共享同一个电话号码来接听所有来电(是的,请振铃所有电话和移动电话,同时共享同一个电话号码),您会如何想? 效果如何?


语言绑定不必反映所有可用的API功能

在这里,有人会提出并在某些情况下是正确的,即并非所有ZeroMQ语言绑定或所有流行的框架包装器都将所有API详细信息暴露给用户以进行应用程序级编程(本文的作者一直在努力长期存在此类遗留冲突,对此原因仍然无法解决,他不得不费劲费力才能找到任何可行的方法来解决这一问题-因此(几乎)总是可行的)


结语:

值得一提的是,ZeroMQ API 4.2.2+的最新版本开始蠕动了最初提出的原则.

尽管如此,值得记住焦虑的记忆之森

(添加了重点,没有大写)

线程安全

ØMQ具有 都是线程安全套接字类型, 不是线程安全套接字类型.应用程序不得使用多个线程的非线程安全套接字,除非将套接字从一个线程迁移到具有完全隔离"内存屏障的另一线程之后.

以下是线程安全套接字:* ZMQ_CLIENT * ZMQ_SERVER * ZMQ_DISH * ZMQ_RADIO * ZMQ_SCATTER * ZMQ_GATHER

尽管这听起来可能很有希望,但在设计性能是必须的高级分布式计算系统的过程中,服务障碍是最糟糕的事情.

最后一个想看到的是阻塞自己的代码,因为这种代理进入了一个基本上不可控制的阻塞状态,在这种状态下,没有人可以跟从它((代理本身在内部,或者从外部),以防远程代理永远不会传递预期的事件(在分布式系统中,由于多种原因或在无法控制的多种情况下可能发生这种情况).

构建一个易于挂起的系统(带有广泛支持的(但天真地使用)语法可能)的笑容确实是一件乐事,

在这里也不会感到惊讶,许多其他(最初不可见)的限制适用于使用shared- {曲棍球杆|电话} API:

ZMQ_CLIENT套接字线程安全的.他们在发送时 ZMQ_RCVMORE在接收方不接受 ZMQ_SNDMORE选项. 这将它们限制为单个零件数据.目的是扩展API以允许分散/收集多部分数据.

c/a

Celluloid::ZMQ在其支持的套接字类型的部分中未报告任何这些新API(共享几乎原谅的罪过)套接字类型,因此没有先兆可预见的好消息,并且Celluloid::ZMQ主活动似乎在2015年消失了,所以从这个角度来看,期望应该是现实的.

这就是说,在通知的后面可能会发现一个有趣的观点:

在使用Celluloid::ZMQ构建自己的分布式赛璐oid系统之前,请确保提供 DCell a外观并决定是否适合您的目的.


最后但并非最不重要的一点是,将事件循环系统组合到另一个事件循环中是一件很痛苦的事情.试图将嵌入式硬实时系统集成到另一个硬实时系统中,甚至可以从数学上证明自己是不可能的.

类似地,如果使用相同的资源,则使用另一个基于代理的组件来构建多代理系统会带来其他种类的冲突和竞争条件(无论是有意还是仅仅"带来一些功能性副作用)来自两个(多个)基于代理的框架.

无法挽回的相互死锁只是这些冲突中的一种,它在不了解设计的尝试中引入了最初看不见的麻烦.单代理系统设计之外的第一步使一个人失去了更多的保证,这些保证在采用多代理(分布式)之前是没有被注意到的,因此,开放的胸怀并准备学习许多新"概念和专心致志在许多新的问题上要认真注意并避免,这是一个重要的先决条件,以免(不知不觉地)引入模式,这些模式现在实际上已成为分布式系统(多代理)域中的反模式. /p>

至少
您已被警告
:o)

Stumble upon reading ZeroMQ FAQ about a Thread safety.

My multi-threaded program keeps crashing in weird places inside the ZeroMQ library. What am I doing wrong?

ZeroMQ sockets are not thread-safe. This is covered in some detail in the Guide.

The short version is that sockets should not be shared between threads. We recommend creating a dedicated socket for each thread.

For those situations where a dedicated socket per thread is infeasible, a socket may be shared if and only if each thread executes a full memory barrier before accessing the socket. Most languages support a Mutex or Spinlock which will execute the full memory barrier on your behalf.

My multi-threaded program keeps crashing in weird places inside the ZeroMQ library.
What am I doing wrong?

Following is my following code:

Celluloid::ZMQ.init
module Scp
    module DataStore
    class DataSocket
        include Celluloid::ZMQ 
            def pull_socket(socket)
                @read_socket = Socket::Pull.new.tap do |read_socket|
                    ## IPC socket
                    read_socket.connect(socket)
                end
            end

            def push_socket(socket)
                @write_socket = Socket::Push.new.tap do |write_socket|
                    ## IPC socket
                    write_socket.connect(socket)
                end
            end

            def run
                pull_socket and push_socket and loopify!
            end

            def loopify!
                loop {
                   async.evaluate_response(read_socket.read_multipart)
                }
            end

            def evaluate_response(data)
                return_response(message_id,routing,Parser.parser(data))
            end

            def return_response(message_id,routing,object)
                data = object.to_response
                write_socket.send([message_id,routing,data])
            end
        end
    end
end  

DataSocket.new.run 

Now, there are couple things I'm unclear off:

1) Assuming that async spawns a new Thread ( every time ) and the write_socket is shared between the all threads and ZeroMQ says that their socket is not thread-safe. I certainly see the write_socket running into threads safety issue.
( Btw, hasn't faced this issue in all end to end testing thus far. )

Question 1 : Is my understanding correct on this?

To solve this, ZeroMQ asks us to achieve this using Mutex, Semaphore.

Which results in Question 2

2) Context Switching.

Given a threaded application can context switch anytime. Looking at the ffi-rzmq code Celluloid::ZMQ .send() internally calls send_strings(), which internally called send_multiple()

Question 2: Context Switching can happen ( anywhere ) inside ( even on critical section ) (here)[https://github.com/chuckremes/ffi-rzmq/blob/master/lib/ffi-rzmq/socket.rb#L510]

This can also lead to a data ordering issue.

Is my following observation correct?

Note:

Operating system ( MacOS, Linux and CentOS )  
Ruby - MRI 2.2.2/2.3.0

解决方案

No one ought risk the application robustness by putting it on thin ice

Forgive this story to be a rather long read, but authors life-long experience shows that reasons why are far more important than any few SLOCs of ( potentially doubtful or mystically-looking or root-cause-ignorant ) attempts to experimentally find how

Initial note

While ZeroMQ has for several decades been promoted as Zero-Sharing ( Zero-Blocking, ( almost )-Zero-Latency and a few more design-maxims. The best place to read about pros & cons are Pieter HINTJENS' books, not just the fabulous "Code Connected, Volume 1", but also the advanced design & engineering in real social-domain ones ) philosophy, the very recent API documentation has introduced and advertises some IMHO features with relaxed relation to these corner-stone principles for distributed-computing, that do not so sharp whistle on Zero-Sharing so loud. This said, I still remain a Zero-Sharing guy, so kindly view the rest of this post in this light.

Answer 1:
No, sir. -- or better -- Yes and No, sir.

ZeroMQ does not ask one to use Mutex/Semaphore barriers. This is something contradicting the ZeroMQ design maxims.

Yes, recent API changes started to mention that ( under some additional conditions ) one may start using shared-sockets ... with ( many ) additional measures ... so the implication was reversed. If one "wants", the one also takes all the additional steps and measures ( and pays all the initally hidden design & implementation costs for "allowing" shared toys to ( hopefully ) survive the principal ( un-necessary ) battle with the rest of the uncontrollable distributed-system environment -- thus suddenly also bearing a risk of failing ( which was for many wise reasons not the case in the inital ZeroMQ Zero-sharing evangelisation ) -- so, user decides on which path to go. That is fair. ).

Sound & robust designs IMHO still had better develop as per initial ZeroMQ API & evangelism, where Zero-sharing was a principle.

Answer 2:
There is by-design always a principal uncertainty about ZeroMQ data-flow ordering, one of ZeroMQ design-maxims keeps designers not to rely on unsupported assumptions on message ordering and many others ( exceptions apply ). There is just a certainty that any message dispatched into the ZeroMQ infrastructure is either delivered as a complete-message, or not delivered at all. So one can be sure just about the fact, that no fragmented wrecks ever appear on delivery. For furhter details, read below.


ThreadId does not prove anything ( unless inproc transport-class used )

Given the internal design of ZeroMQ data-pumping engines, the instantiation of a
zmq.Context( number_of_IO_threads ) decides on how many threads get spawned for handling the future data-flows. This could be anywhere { 0, 1: default, 2, .. } up to almost depleting the kernel-fixed max-number-of-threads. The value of 0 gives a reasonable choice not to waste resources in case, where inproc:// transport-class is actually a direct-memory region mapped handling of data-flow ( that actually never flow ang get nailed down directly into the landing-pad of the receiving socket-abstraction :o) ) and no thread is ever needed for such job.
Next to this, the <aSocket>.setsockopt( zmq.AFFINITY, <anIoThreadEnumID#> ) permits to fine-tune the data-related IO-"hydraulics", so as to prioritise, load-balance, performance-tweak the thread-loads onto the enumerated pool of zmq.Context()-instance's IO-threads and gain from better and best settings in the above listed design & data-flow operations aspects.


The cornerstone-element is the Context()s' instance,
not a Socket()'s one

Once a Context()'s instance got instantiated and configured ( ref. above why and how ), it is ( almost ) free-to-be-shared ( if design cannot resist from sharing or has a need to avoid a setup of a fully fledged distributed-computing infrastructure ).

In other words, the brain is always inside the zmq.Context()'s instance - all the socket-related dFSA-engines are setup / configured / operated there ( yes, even though the syntax is <aSocket>.setsockopt(...) the effect of such is implemented inside The Brain -- in the respective zmq.Context - not in some wire-from-A-to-B.

Better never share <aSocket> ( even if API-4.2.2+ promises you could )

So far, one might have seen a lot of code-snippets, where ZeroMQ Context and it's sockets get instantiated and disposed off in a snap, serving just a few SLOC-s in a row, but -- this does not mean, that such practice is wise or adjusted by any other need than a that very academic example ( that was made in just a need to get printed in as few SLOCs as possible, because of the book publisher's policies ).

Even in such cases a fair warning about indeed immense costs of zmq.Context infrastructure setup / tear-down ought be present, thus to avoid any generalisation, the less any copy/paste replicas of such the code, that was used short-handedly just for such illustrative purposes.

Just imagine the realistic setups needed to take place for any single Context instance -- to get ready a pool of respective dFSA-engines, maintaining all their respective configuration setups plus all the socket-end-point pools related transport-class specific hardware + external O/S-services handlers, round-robin event-scanners, buffer-memory-pools allocations + their dynamic-allocators etc, etc. This all takes both time and O/S resources, so handle these ( natural ) costs wisely and with care for adjusted overheads, if performance is not to suffer.

If still in doubt why to mention this, just imagine if anybody would insist of tearing down all the LAN-cables right after a packet was sent and having a need to wait until a new cabling gets installed right before a need to sent the next packet appears. Hope this "reasonable-instantiation" view could be now better percepted and an argument to share ( if at all ) a zmq.Context()-instance(s), without any further fights for trying to share ZeroMQ socket-instances ( even if newly becoming ( almost ) thread-safe per-se ).

The ZeroMQ philosophy is robust if taken as an advanced design evangelism for high performance distributed-computing infrastructures. Tweaking just one ( minor ) aspect typically does not adjust all the efforts and costs as on the global view on how to design safe and performant systems, the result would not move a single bit better ( and even the absolutely-share-able risk-free ( if that were ever possible ) socket-instances will not change this, whereas all the benefits for sound-design, clean-code and reasonably achievable test-ability & debugging will get lost ) if just this one detail gets changed -- So, rather pull another wire from an existing brain to such a new thread, or equip a new thread with it's own brain, that will locally handle it's resources and allow it to connect own wires back to all other brains -- as necessary to communicate to -- in the distributed-system ).

If still in doubts, try to imagine what would happen to your national olympic hockey-team, if it were sharing just one single hockey-stick during the tournament. Or how would you like, if all neighbours in your home-town would share the same phone number to answer all the many incoming calls ( yes, with ringing all the phones and mobiles, sharing the same number, at the same time ). How well would that work?


Language bindings need not reflect all the API-features available

Here, one can raise, and in some cases being correct, that not all ZeroMQ language-bindings or all popular framework-wrappers keep all API-details exposed to user for application-level programming ( author of this post has struggled for a long time with such legacy conflicts, that remained unresolvable right to this reason and had to scratch his head a lot to find any feasible way to get around this fact - so it is ( almost ) always doable )


Epilogue:

It is fair to note, that recent versions of ZeroMQ API 4.2.2+ started to creep the inital evangelisated principles.

Nevertheless, worth to remember the anxient memento mori

( emphases added, capitalisation not )

Thread safety

ØMQ has both thread safe socket type and not thread safe socket types. Applications MUST NOT use a not thread safe socket from multiple threads except after migrating a socket from one thread to another with a "full fence" memory barrier.

Following are the thread safe sockets: * ZMQ_CLIENT * ZMQ_SERVER * ZMQ_DISH * ZMQ_RADIO * ZMQ_SCATTER * ZMQ_GATHER

While this text might sound to some ears as a promising, calling barriers to service is the worst thing one can do in designing advanced distributed-computing systems, where performance is a must.

The last thing one would like to see is to block one's own code, as such agent gets into a principally uncontrollable blocking-state, where no-one can heel it from ( neither the agent per-se internally, nor anyone from outside ), in case a remote agent never delivers a-just-expected event ( which in distributed-systems can happen by so many reasons or under so many circumstances that are outside of one's control).

Building a system that is prone to hang itself ( with a broad smile of supported ( but naively employed ) syntax-possibility ) is indeed nothing happy to do, the less a serious design job.

One would also not become surprised here, that many additional ( initially not visible ) restrictions apply down the line of the new moves into using shared-{ hockey-stick | telephones } API:

ZMQ_CLIENT sockets are threadsafe. They do not accept the ZMQ_SNDMORE option on sends not ZMQ_RCVMORE on receives. This limits them to single part data. The intention is to extend the API to allow scatter/gather of multi-part data.

c/a

Celluloid::ZMQ does not report any of these new-API-( a sin of sharing almost forgiving ) socket types in its section on supported socket typed so no good news to be expected a-priori and Celluloid::ZMQ master activity seems to have faded out somewhere in 2015, so expectations ought be somewhat realistic from this corner.

This said, one interesting point might be found behind a notice:

before you go building your own distributed Celluloid systems with Celluloid::ZMQ, be sure to give DCell a look and decide if it fits your purposes.


Last but not least, combining event-loop system inside another event-loop is a painful job. Trying to integrate an embedded hard-real-time system into another hard-real-time system could even mathematically prove itself to be impossible.

Similarly, building multi-agent system using another agent-based component brings additional kinds of collisions and race-conditions, if meeting the same resources, that are harnessed ( be it knowingly or by "just" some functional side-effect ) from both ( multiple ) agent-based frameworks.

Un-salvageable mutual dead-locks are just one kind of these collisions, that introduce initally un-seen troubles down the line of un-aware design attempts. The very first step outside of a single-agent system design makes one lose many more warranties, that were un-noticed in place before going multi-agent ( distributed ), so open minds and being ready to learn many "new" concepts and concentration on many new concerns to be carefully watched for and fought to avoid are quite an important prerequisite, so as not to ( un-knowingly ) introduce patterns, that are now actually anti-patterns in distributed-systems ( multi-agent ) domain.

At least
You have been warned
:o)

这篇关于如何处理ZeroMQ + Ruby中的线程问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆