使用不可变/持久类型和数据结构的并发性如何工作? [英] How does concurrency using immutable/persistent types and data structures work?

查看:94
本文介绍了使用不可变/持久类型和数据结构的并发性如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近阅读了很多关于函数式语言的文章。由于那些只使用不可变结构,他们声称并发问题大大改善/解决。我在理解如何在实际环境中实际实现这一点时遇到了一些严重的困难。假设我们有一个Web服务器,其中一个线程在端口上监听(好吧,IO是另一件事情,我很困难,但现在我们只是忽略它)。在任何连接尝试中,都会创建一个套接字并将其传递给一个新创建的线程,该线程会对其执行一些工作,并根据收到的通信将更改应用于服务器应用程序全局的大型列表/数据结构。那么,这个列表如何访问工作,以便所有线程拥有一致的列表视图(或者至少为了在线程以正确的方式死亡时将一个线程所做的所有更改应用到列表中) ?

我的理解问题是:


  • 显然任何线程都可以得到不可更改要处理的列表的快照。但是,在通过创建应用了更改的列表的新版本来更改内容之后,我们仍然保留每个线程都有自己版本的列表。那些如何合并在一起?

  • 另一种方法可能包括使用传统的锁定机制,例如互斥锁/ cond或go-like-channels。但是,如果所有变量都是不可变的,那么你将如何创建这样的事情?

  • 我听说过STM,但是它无法处理副作用(即,如果列表也可以透明地备份将数据转换为文件或数据库)



那么您如何用函数式语言对这样的东西进行建模?

解决方案

不可变值有几个适用的应用程序。并行/并行处理只是其中最近变得更重要的一个。以下是真正的最基本的摘要摘要从经验和许多书籍和谈论这个问题。你可能需要最终深入一些。

你在这里展示的主要例子是关于管理全局状态,所以它不能纯粹地不可改变地完成。但是,即使在这里,使用不可变数据结构也有很好的理由。一些来自我头顶的人:


  • try-catch表现得更好,因为您不会修改可能会被修改的共享对象半途,具有不可变的值,它会自动保持最后的一致状态

  • 减少对非常有限的全局变量集合的多态安全比较和交换操作的变化状态(理想情况下一个),完全消除了死锁

  • 免费传递数据结构,而没有任何防御性复制,这种情况经常会在被遗忘时出现神秘错误(很多时候在调用函数和被调用函数中都创建了防御副本,因为开发人员在几次调试会话后开始倾向于比安全性更好)

  • 更容易进行单元测试,因为许多运行在不可变值上的函数都是无副作用的
  • >
  • 通常更简单的序列化和更透明的比较语义
    更容易调试和获取(记录)一个curren t即使是异步的系统状态快照



尽管如此,请回到您的问题。

在最常见的情况下,这种情况下的全局状态通常使用顶部的一个可变引用来保存到不可变数据结构中。

该引用仅由CAS原子操作更新。

不变数据结构由一个副作用自由函数转换,并且当所有转换完成时,引用以原子方式交换。如果两个线程/内核想要同时交换从同一个旧线程获得的新值,那么首先赢得对方的那个不会成功(CAS语义)和需求重复操作(取决于转换,或者用新值更新当前值,或者从头开始转换新值)。这可能看起来很浪费,但是这里的假设是重做一些工作通常比永久锁定/同步开销便宜。

当然这可以优化,例如通过划分不可变数据结构的独立部分以通过独立更新多个引用来进一步减少潜在的冲突。

访问数据结构是无锁的,速度非常快,始终提供一致的响应。边缘情况下,比如当你发送更新时,另一个客户端在任何系统中都会收到较早的数据,因为网络请求可能会出现乱序...



STM很少使用,通常你最好使用包含你在STM事务中使用的引用的所有值的数据结构的原子交换。

I've read a lot about functional languages recently. Since those use only immutable structures they claim that concurrency issues are greatly improved/solved. I'm having some serious trouble understanding how this can actually be delivered in a real-life context. Let's assume we have a web server with one thread listening on a port (well, IO is another thing i have difficulty wrapping my head around but let's just ignore that for now); On any connection attempt a socket is created and passed to a newly created thread which does some work with it and, depending on the received communications, may apply changes to a big list/data structure which is global to the server application. So, how does this list access work then in order for all threads having a consistent view of the list (or at least in order to have all changes made by one thread applied to the list as soon as the thread dies in a correct manner)?

My problems understanding are:

  • Obviously any thread can get a unchangeable "snapshot" of the list to work on. However, after "changing" the contents by creating a new version of the list with the changes applied, we are still left with every thread having their own version of the list. How are those merged back together?
  • Another method might consist of using traditional locking mechanisms like mutex/cond or go-like-channels. However, how would you even create such a thing when all variables are immutable?
  • I've heard about STM, however that cannot deal with side effects (i.e. if the list would also transparently backup the data to a file or db)

So how would you model such a thing in a functional language?

解决方案

Immutable values have several applications for which they are well suited. The concurrent/parallel processing is just one of them that got more important recently. The following is really the most basic digest from experience and many books and talks about the subject. You may need to dive into some eventually.

The main example you show here is about managing global state, so it cannot be done purely "immutably". However, even here there are very good reasons to use immutable data structures. Some of those from top of my head:

  • try - catch behaves much better, because you do not modify shared object that may be left modified half-way, with immutable values, it automatically keeps the last consistent state
  • reducing changing state to just multicore safe "compare-and-swap" operations on very limited set of global variables (ideally one), completely eliminates deadlocks
  • free passing of data structures without any defensive copying that is quite often case of mysterious bugs when forgotten (many times defensive copies are created in both calling and called functions, because developers start to incline to "better safe than sorry" after a couple of debugging sessions)
  • much easier unit testing, because many functions operating on immutable values are side-effect free
  • usually easier serialization and more transparent comparison semantics much easier debugging and taking (logging) a current snapshot of the system state even asynchronously

Back to your question though.

In the most trivial case the global state in this case is often modelled using one mutable reference at the top holding onto an immutable data structure.

The reference is updated only by CAS atomic operation.

The immutable data structure is transformed by a side-effect free functions and when all transformations are done the reference is swapped atomically.

If two threads/cores want to swap simultaneously new values got from the same old one, the one doing that first wins the other does not succeed (CAS semantics) and needs to repeat the operation (depends on the transformation, either updating the current one with the new value, or transforming the new value from the beginning). This may seem wasteful, but the assumption here is that redoing some work is often cheaper than permanent locking/synchronization overhead.

Of course this can be optimized e.g. by partitioning independent parts of immutable data structures to further reduce potential collisions by having several references being updated independently.

Access to the data structure is lock-free and very fast and always gives a consistent response. Edge cases like when you send an update and another client receives older data afterwards is to be expected in any system, because of network requests can get out of order too...

STM is useful quite rarely and usually you are better of to use atomic swaps of data structure containing all values from references you would use in STM transaction.

这篇关于使用不可变/持久类型和数据结构的并发性如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆