标量并行收集处理的性能 [英] Performance of scala parallel collection processing

查看:86
本文介绍了标量并行收集处理的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在某些情况下,我需要一次处理成千上万条记录.有时,它可能成百上千,可能多达30000条记录.我当时在考虑使用scala的并行集合.因此,为了了解它们之间的区别,我编写了一个简单的pgm,如下所示:

I have scenarios where I will need to process thousands of records at a time. Sometime, it might be in hundreds, may be upto 30000 records. I was thinking of using the scala's parallel collection. So just to understand the difference, I wrote a simple pgm like below:

object Test extends App{
  val list = (1 to 100000).toList
  Util.seqMap(list)
  Util.parMap(list)
}

object Util{
  def seqMap(list:List[Int]) = {
    val start = System.currentTimeMillis
    list.map(x => x + 1).toList.sum
    val end = System.currentTimeMillis
    println("time taken =" + (end - start))
    end - start
  }
  def parMap(list:List[Int]) = {
    val start = System.currentTimeMillis
    list.par.map(x => x + 1).toList.sum
    val end = System.currentTimeMillis
    println("time taken=" + (end - start))
    end - start
  }
}

我希望并行运行会更快.但是,我得到的输出是

I expected that running in parallel will be faster. However, the output I was getting was

time taken =32
time taken=127

计算机配置:

Intel i7 processor with 8 cores
16GB RAM
64bit Windows 8

我做错了什么?这不是并行映射的正确方案吗?

What am I doing wrong? Is this not a correct scenario for parallel mapping?

推荐答案

问题在于,您正在执行的操作是如此之快(只需添加两个int),以至于并行化的开销超过了好处.只有在操作速度较慢的情况下,并行化才有意义.

The issue is that the operation you are performing is so fast (just adding two ints) that the overhead of doing the parallelization is more than the benefit. The parallelization only really makes sense if the operations are slower.

这样想:如果您有8个朋友,并且给每个朋友一个整数在一张纸上,并告诉他们加一个,然后将结果记下来,然后还给您,然后记录下来.给他们一个下一个整数,您将花费大量时间来回传递消息,这样您本来可以更快地完成所有添加操作.

Think of it this way: if you had 8 friends and you gave each one an integer on a piece of paper and told them to add one, write the result down, and give it back to you, which you would record before giving them the next integer, you'd spend so much time passing messages back and forth that you could have just done all the adding yourself faster.

ALSO:永远不要在List上执行.par,因为并行化过程必须将整个列表复制到并行集合中,然后再将整个对象复制回去.如果您使用Vector,则不必执行额外的工作.

ALSO: Never do .par on a List because the parallelization procedure has to copy the entire list into a parallel collection and then copy the whole thing back out. If you use a Vector, then it doesn't have to do this extra work.

这篇关于标量并行收集处理的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆