为什么我的斯卡拉期货没有效率更高? [英] Why aren't my scala futures more efficient?

查看:76
本文介绍了为什么我的斯卡拉期货没有效率更高?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在32位四核Core2系统上运行以下scala代码:

I'm running this scala code on a 32-bit quad-core Core2 system:

def job(i:Int,s:Int):Long = {
  val r=(i to 500000000 by s).map(_.toLong).foldLeft(0L)(_+_)
  println("Job "+i+" done")
  r
}

import scala.actors.Future
import scala.actors.Futures._

val JOBS=4

val jobs=(0 until JOBS).toList.map(i=>future {job(i,JOBS)})
println("Running...")
val results=jobs.map(f=>f())
println(results.foldLeft(0L)(_+_))

(是的,我知道有许多更有效的方法来对一系列整数求和;这只是让CPU做某事).

(Yes, I do know there are much more efficient ways to sum a series of integers; it's just to give the CPU something to do).

根据我将JOBS设置为的内容,代码将在以下时间运行:

Depending on what I set JOBS to, the code runs in the following times:

JOBS=1 : 31.99user 0.84system 0:28.87elapsed 113%CPU
JOBS=2 : 27.71user 1.12system 0:14.74elapsed 195%CPU
JOBS=3 : 33.19user 0.39system 0:13.02elapsed 257%CPU
JOBS=4 : 49.08user 8.46system 0:22.71elapsed 253%CPU

令我惊讶的是,这实际上并没有超出2个进行中"的期货范围.我做了很多多线程C ++代码,并且毫无疑问,如果我用Intel的TBB或boost::threads进行这种编码,我可以很好地扩展到4个内核,并且看到> 390%的CPU使用率(这会更多)当然是冗长的.

I'm surprised that this doesn't really scale well beyond 2 futures "in play". I do a lot of multithreaded C++ code and have no doubt I'd get good scaling up to 4 cores and see >390% CPU utilisation if I coded this sort of thing with Intel's TBB or boost::threads (it'd be considerably more verbose of course).

所以:这是怎么回事,我如何才能扩展到我希望看到的4个内核?这受scala或JVM中的限制吗?它发生在我身上,我实际上不知道Scala的期货在哪里"运行……是每个期货产生的线程,还是"Futures"提供了专用于运行它们的线程池?

So: what's going on and how can I get the scaling to 4 cores I'd expect to see ? Is this limited by something in scala or the JVM ? It occurs to me I don't actually know "where" scala's futures run... is a thread spawned per future, or does "Futures" provide a thread pool dedicated to running them ?

[[我在带有Sun-java6(6-20-0lennny1)的Lenny系统上使用Debian/Squeeze的scala 2.7.7软件包.]

[I'm using the scala 2.7.7 packages from Debian/Squeeze on a Lenny system with sun-java6 (6-20-0lennny1).]

更新:

如雷克斯回答中所建议,我重新编码以避免创建对象.

As suggested in Rex's answer, I recoded to avoid object creation.

def job(i:Long,s:Long):Long = {
  var t=0L
  var v=i
  while (v<=10000000000L) {
    t+=v
    v+=s
  }
  println("Job "+i+" done")
  t
}
// Rest as above...

这是如此之快,以至于我不得不大幅增加迭代次数才能运行任意时间!结果是:

This was so much faster I had to significantly increase the iteration count to run for any amount of time! Results are:

JOBS=1: 28.39user 0.06system 0:29.25elapsed 97%CPU
JOBS=2: 28.46user 0.04system 0:14.95elapsed 190%CPU
JOBS=3: 24.66user 0.06system 0:10.26elapsed 240%CPU
JOBS=4: 28.32user 0.12system 0:07.85elapsed 362%CPU

这更像我希望看到的(尽管3个工作的情况有点奇怪,一个任务始终要比另外两个任务持续完成几秒钟).

which is much more like what I'd hope to see (although the 3 jobs case is a little odd, with one task consistently completing a couple of seconds before the other two).

再进一步说明一下,在四核超线程i7上,具有JOBS=8的后者版本实现了x4.4的加速,而JOBS = 1,CPU使用率为571%.

Pushing it a bit further, on a quad-core hyperthreaded i7 the latter version with JOBS=8 achieves an x4.4 speedup vs JOBS=1, with 571% CPU usage.

推荐答案

我的猜测是,垃圾回收器比加法器本身所做的工作更多.因此,您受到垃圾收集器可管理内容的限制.尝试使用不会创建任何对象的东西再次运行测试(例如,使用while循环而不是range/map/fold).如果您的实际应用程序会严重影响GC,那么您也可以使用并行GC选项.

My guess is that the garbage collector is doing more work than the addition itself. So you're limited by what the garbage collector can manage. Try running the test again with something that doesn't create any objects (e.g. use a while loop instead of the range/map/fold). You can also play with the parallel GC options if your real application will hit the GC this heavily.

这篇关于为什么我的斯卡拉期货没有效率更高?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆