Map[..] 上的 Scala map() 比 mapValues() 慢得多 [英] Scala map() on a Map[..] much slower than mapValues()
问题描述
在我编写的 Scala 程序中,我有一个 scala.collection.Map
将字符串映射到一些计算值(详细来说它是 Map[String, (Double, immutable.Map[String、Double]、Double)]
- 我知道这很丑陋,应该(并且将)包装).现在,如果我这样做:
In a Scala program I wrote I have a scala.collection.Map
that maps a String to some calculated values (in detail it's Map[String, (Double, immutable.Map[String, Double], Double)]
- I know that's ugly and should (and will be) wrapped). Now, if I do this:
stats.map { case(c, (prior, pwc, denom)) => {
println(c)
...
}
}
打印出 c
值的大约 50 倍大约需要 30 秒!println
只是一个测试语句 - 我需要的实际计算甚至更慢(我在完全沉默 1 分钟后中止).但是,如果我这样做:
it takes about 30 seconds to print out roughly 50 times a value of c
! The println
is just a test statement - the actual calculation I need was even slower (I aborted after 1 minute of complete silence). However, if I do it like this:
stats.mapValues { case (prior, pwc, denom) => {
println(prior)
...
}
}
我没有遇到这些性能问题...谁能解释为什么会这样?我是否没有遵循一些重要的 Scala 指南?
I don't run into these performance issues ... Can anyone explain why this is happening? Am I not following some important Scala guidelines?
感谢您的帮助!
我进一步调查了这种行为.我的猜测是瓶颈来自 Map
数据结构的访问.如果我执行以下操作,我会遇到相同的性能问题:
I further investigated the behaviour. My guess is that the bottleneck comes from accessin the Map
datastructure. If I do the following, I have have the same performance issues:
classes.foreach{c => {
println(c)
val ps = stats(c)
}
}
这里的 classes
是一个 List[String]
,它在外部存储 Map 的键.如果无法访问 stats(c)
,则不会发生性能损失.
Here classes
is a List[String]
that stores the keys of the Map externally. Without the access to stats(c)
no performance losses occur.
推荐答案
mapValues
实际上返回原始地图上的视图,这可能会导致意外的性能问题.来自这篇博文:
mapValues
actually returns a view on the original map, which can lead to unexpected performance issues. From this blog post:
...这里有一个问题:map 和 mapValues 在一个不那么微妙的地方是不同的办法.mapValues 与 map 不同,它返回原始地图上的视图.这视图包含对原始地图和对转换函数(此处为 (_ + 1)).每次返回的地图(view) 被查询,首先查询原始地图,然后对结果调用转换函数.
...here is a catch: map and mapValues are different in a not-so-subtle way. mapValues, unlike map, returns a view on the original map. This view holds references to both the original map and to the transformation function (here (_ + 1)). Every time the returned map (view) is queried, the original map is first queried and the tranformation function is called on the result.
我建议阅读该帖子的其余部分以了解更多详细信息.
I recommend reading the rest of that post for some more details.
这篇关于Map[..] 上的 Scala map() 比 mapValues() 慢得多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!