必须访问scala.collection.immutable.List和Vector才能同步? [英] Must access to scala.collection.immutable.List and Vector be synchronized?

查看:164
本文介绍了必须访问scala.collection.immutable.List和Vector才能同步?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过在Scala中学习并发编程 ,并遇到以下问题:


然而,在当前版本的Scala中,
的某些集合被视为不可变作为List和Vector,不能在没有
同步的情况下共享。虽然他们的外部API不允许你
修改它们,但是它们包含非最终字段。



提示:即使一个对象
似乎不可改变,总是使用适当的同步来在线程之间共享任何对象


/ em> by Aleksandar Prokopec,第2章结尾(第58页),Packt Publishing,2014年11月。



这是对的吗?



我的工作假设总是被描述为不可变的Scala库数据结构中的任何内部可变性(实现惰性,缓存,无论什么)将是幂等的,这样可能发生在坏的种族是工作将不必要地重复。这个作者似乎建议正确性可能会被并发访问的不可变结构所困扰。真的吗?我们真的需要同步对列表的访问吗?



我转换到不可变的样式的大部分是由于希望避免同步和潜在的争用开销需要。要了解同步不能避免Scala的核心不可变数据结构,这将是一个不愉快的大事。这个作者是否过于保守?



Scala的文档集合包括以下内容:


scala.collection.immutable包中的集合保证对每个人都是不可变的。这样的集合在创建之后永远不会改变。因此,您可以依赖于在不同时间点重复访问相同的集合值将始终产生具有相同元素的集合。


这并不是说它们对于多线程的并发访问是安全的。

解决方案

这取决于其中您分享了这些内容:




  • 不能安全地在scala-library内共享。




简单地说,这些集合比只有final字段的对象更少受保护。无论它们在JVM级别上是否相同(没有像 ldc 这样的优化),它们都可能是具有一些可变地址的字段,因此可以使用 putfield 字节码命令。无论如何, var 仍然受到编译器更少的保护,与java的 final scala的 final val val



但在大多数情况下,行为是逻辑上不可变的 - 所有可变操作都被封装(对于Scala代码)。让我们看看 Vector 。它需要可变字段来实现附加算法:

  private var dirty = false 

//来自VectorPointer
private [immutable] var depth:Int = _
private [immutable] var display0:Array [AnyRef] = _
private [immutable] var display1:Array [AnyRef] = _
private [immutable] var display2:Array [AnyRef] = _
private [immutable] var display3:Array [AnyRef] = _
private [immutable] b $ b private [immutable] var display5:Array [AnyRef] = _

  val s = new Vector(startIndex,endIndex + 1,blockIndex)
s.initFrom使用displayN和depth
s.gotoPos(startIndex,startIndex ^ focus)//使用displayN
s.gotoPosWritable //使用dirty
...
s.dirty = dirty

s 返回它。因此,甚至不需要关心 happens-before 保证 - 所有可变操作都在同一个线程中执行 :+ +:更新),这只是一种初始化。这里唯一的问题是 private [somePackage] 可以直接从Java代码访问和从scala-library本身访问,所以如果你传递给一些Java的方法,它可以修改它们。



认为你应该担心线程安全,让我们说 cons a>运算符。它还有可变字段:

 最终案例类别:: [B](覆盖val头:B,私人[scala] var tl:List [B])extends List [B] {
override def tail:List [B] = tl
override def isEmpty:Boolean = false
}

但是他们只使用了内部库方法(在一个线程内),没有任何显式共享或线程创建,他们总是返回一个新的集合,让我们以以为例:

  override def take :Int):List [A] = if(isEmpty || n <= 0)Nil else {

val h = new::( head,Nil)
var t = h
var rest = tail
var i = 1
while({if(rest.isEmpty)return this; i< n}){
i + = 1
val nx = new::( rest.head,Nil)
t.tl = nx //这是t的提交的突变
t = nx
rest = rest.tail
}
h
}

所以这里 t.tl = nx 与在线程安全的意义上的 t = nx 没有太大区别。它们都只从单个堆栈( take 的堆栈)。 Althrought,如果我添加让我们说 someActor! t (或任何其他异步操作), someField = t someFunctionWithExternalSideEffect(t)之内,而循环 - 我可以打破这个合约。






关于与JSR-133的关系:



1) new::( head,Nil)在堆中创建新对象并将其地址(让我们说0x100500)放入堆栈( val h =



2)只要这个地址在堆栈中,它只知道当前线程



3)其他线程可能只有在共享此地址后才会涉及进入某一领域;在之前调用 areturn

< 133只要0x100500是堆栈的一部分(不是堆,不是其他的堆栈)。但是,0x100500的对象的某些字段可能指向一些共享对象(可能在JSR-133范围内),但是这里不是这样(因为这些对象对于外部是不可变的)。






我认为(希望)作者的意思是为库的开发者提供逻辑同步保证 - 如果你正在开发scala库,你仍然需要小心这些事情,因为这些 var private [scala] private [immutable] 所以,可以编写一些代码来从不同的线程变异。从scala-library开发人员的角度来看,它通常意味着单实例上的所有突变应该应用于单线程并且仅应用于用户不可见的集合(目前)。或者,简单地说 - 不要以任何方式为外部用户打开可变字段。



Scala在同步时遇到了几个意想不到的问题,导致某些库中的部分突然不是线程安全的,所以我不知道如果有可能是错误的(这是一个错误),但是我们说99%的情况下,99%的方法不可变集合是线程安全的。在最糟糕的情况下,你可能会使用一些破碎的方法或只是(在一些情况下可能不只是只是)需要克隆每个线程的集合。



无论如何,不​​变性仍然是线程安全的好方法。



PS2可能会破坏不可变集合的线程安全的异常案例正在使用反射来访问其非最终字段。






关于另一个异乎寻常但是真的可怕的方式,另外一点,在@Steve Waldman和@ axel22(作者)的评论中指出。如果你共享不可变集合作为一些对象共享线程的成员&&如果集合的构造函数物理上(通过JIT)内联(默认情况下不是逻辑上内联的)&&如果你的JIT实现允许重新排列内联代码与正常的 - 然后你必须同步它(通常足够有 @volatile )。然而,恕我直言,我不相信最后一个条件是一个正确的行为 - 但现在,既不能证明也不反驳。


I'm going through Learning Concurrent Programming in Scala, and encountered the following:

In current versions of Scala, however, certain collections that are deemed immutable, such as List and Vector, cannot be shared without synchronization. Although their external API does not allow you to modify them, they contain non-final fields.

Tip: Even if an object seems immutable, always use proper synchronization to share any object between the threads.

From Learning Concurrent Programming in Scala by Aleksandar Prokopec, end of Chapter 2 (p.58), Packt Publishing, Nov 2014.

Can that be right?

My working assumption has always been that any internal mutability (to implement laziness, caching, whatever) in Scala library data structures described as immutable would be idempotent, such that the worst that might happen in a bad race is work would be unnecessarily duplicated. This author seems to suggest correctness may be imperiled by concurrent access to immutable structures. Is that true? Do we really need to synchronize access to Lists?

Much of my transition to an immutable-heavy style has been motivated by a desire to avoid synchronization and the potential contention overhead it entails. It would be an unhappy big deal to learn that synchronization cannot be eschewed for Scala's core "immutable" data structures. Is this author simply overconservative?

Scala's documentation of collections includes the following:

A collection in package scala.collection.immutable is guaranteed to be immutable for everyone. Such a collection will never change after it is created. Therefore, you can rely on the fact that accessing the same collection value repeatedly at different points in time will always yield a collection with the same elements.

That doesn't quite say that they are safe for concurrent access by multiple threads. Does anyone know of an authoritative statement that they are (or aren't)?

解决方案

It depends on where you share them:

  • it's not safe to share them inside scala-library
  • it's not safe to share them with Java-code, reflection

Simply saying, these collections are less protected than objects with only final fields. Regardless that they're same on JVM level (without optimization like ldc) - both may be fields with some mutable address, so you can change them with putfield bytecode command. Anyway, var is still less protected by the compiler, in comparision with java's final, scala's final val and val.

However, it's still fine to use them in most cases as their behaviour is logically immutable - all mutable operations are encapsulated (for Scala-code). Let's look at the Vector. It requires mutable fields to implement appending algorithm:

private var dirty = false

//from VectorPointer
private[immutable] var depth: Int = _
private[immutable] var display0: Array[AnyRef] = _
private[immutable] var display1: Array[AnyRef] = _
private[immutable] var display2: Array[AnyRef] = _
private[immutable] var display3: Array[AnyRef] = _
private[immutable] var display4: Array[AnyRef] = _
private[immutable] var display5: Array[AnyRef] = _

which is implemented like:

val s = new Vector(startIndex, endIndex + 1, blockIndex)
s.initFrom(this) //uses displayN and depth
s.gotoPos(startIndex, startIndex ^ focus) //uses displayN
s.gotoPosWritable //uses dirty
...
s.dirty = dirty

And s comes to the user only after method returned it. So it's not even concern of happens-before guarantees - all mutable operations are performed in the same thread (thread where you call :+, +: or updated), it's just kind of initialization. The only problem here is that private[somePackage] is accessible directly from Java code and from scala-library itself, so if you pass it to some Java's method it could modify them.

I don't think you should worry about thread-safety of let's say cons operator. It also has mutable fields:

final case class ::[B](override val head: B, private[scala] var tl: List[B]) extends List[B] {
  override def tail : List[B] = tl
  override def isEmpty: Boolean = false
}

But they used only inside library methods (inside one-thread) without any explicit sharing or thread creation, and they always return a new collection, let's consider take as an example:

override def take(n: Int): List[A] = if (isEmpty || n <= 0) Nil else {

    val h = new ::(head, Nil)
    var t = h
    var rest = tail
    var i = 1
    while ({if (rest.isEmpty) return this; i < n}) {
      i += 1
      val nx = new ::(rest.head, Nil)
      t.tl = nx //here is mutation of t's filed 
      t = nx
      rest = rest.tail
    }
    h
}

So here t.tl = nx is not much differ from t = nx in the meaning of thread-safety. They both are reffered only from the single stack (take's stack). Althrought, if I add let's say someActor ! t (or any other async operation), someField = t or someFunctionWithExternalSideEffect(t) right inside the while loop - I could break this contract.


A little addtion here about relations with JSR-133:

1) new ::(head, Nil) creates new object in the heap and puts its address (lets say 0x100500) into the stack(val h =)

2) as long as this address is in the stack, it's known only to the current thread

3) Other threads could be involved only after sharing this address by putting it into some field; in case of take it has to flush any caches (to restore the stack and registers) before calling areturn (return h), so returned object will be consistent.

So all operations on 0x100500's object are out of scope of JSR-133 as long as 0x100500 is a part of stack only (not heap, not other's stacks). However, some fields of 0x100500's object may point to some shared objects (which might be in scope JSR-133), but it's not the case here (as these objects are immutable for outside).


I think (hope) the author meant logical synchronization guarantees for library's developers - you still need to be careful with these things if you're developing scala-library, as these vars are private[scala], private[immutable] so, it's possible to write some code to mutate them from different threads. From scala-library developer's perspective, it usually means that all mutations on single instance should be applied in single thread and only on collection that invisible to a user (at the moment). Or, simply saying - don't open mutable fields for outer users in any way.

P.S. Scala had several unexpected issues with synchronization, which caused some parts of the library to be surprisely not thread-safe, so I wouldn't wonder if something may be wrong (and this is a bug then), but in let's say 99% cases for 99% methods immutable collections are thread safe. In worst case you might be pushed from usage of some broken method or just (it might be not just "just" for some cases) need to clone the collection for every thread.

Anyway, immutability is still a good way for thread-safety.

P.S.2 Exotic case which might break immutable collections' thread-safety is using reflection to access their non-final fields.


A little addition about another exotic but really terrifying way, as it pointed out in comments with @Steve Waldman and @axel22 (the author). If you share immutable collection as member of some object shared netween threads && if collection's constructor becomes physically (by JIT) inlined (it's not logically inlined by default) && if your JIT-implementation allows to rearrange inlined code with normal one - then you have to synchronize it (usually is enough to have @volatile). However, IMHO, I don't believe that last condition is a correct behaviour - but for now, can't neither prove nor disprove that.

这篇关于必须访问scala.collection.immutable.List和Vector才能同步?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆