scala范围与列出大型集合的性能 [英] scala ranges versus lists performance on large collections

查看:130
本文介绍了scala范围与列出大型集合的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我为10,000,000个元素运行了一组性能基准,我发现每个实现的结果差别很大。



任何人都可以解释为什么创建一个范围.ByOne,导致性能优于一个简单的基本数组,但是将同一范围转换为列表会导致比最差情况更差的性能。



创建10,000,000个元素,并打印出为1,000,000的模块。 JVM大小始终设置为相同的最小值和最大值:-Xms?m -Xmx?m

  import java.util.concurrent。 TimeUnit 
import java.util.concurrent.TimeUnit._

对象LightAndFastRange扩展了App {

def chrono [A](f:=> A,timeUnit :TimeUnit = MILLISECONDS):(A,Long)= {
val start = System.nanoTime()
val result:A = f
val end = System.nanoTime $($%)

$ b百分比():List [Int] =(0到10000000).filter(_%1000000 = = 0).toList

val results = chrono(million())
results._1.foreach(x => println(x:+ x))
println(Time:+ results._2);
}

JVM大小为27m需要141毫秒



相比之下,转换为List会显着影响性能:

  import java.util.concurrent。 TimeUnit 
import java.util.concurrent.TimeUnit._

对象LargeLinkedList extends App {
def chrono [A](f:=> A,timeUnit:TimeUnit = MILLISECONDS ):(A,Long)= {
val start = System.nanoTime()
val result:A = f
val end = System.nanoTime()
时间单位转换(end-start,NANOSECONDS))
}

val results = chrono((0到10000000).toList.filter(_%1000000 == 0))
results._1.foreach(x => println(x:+ x))
println(Time:+ results._2)
}

与460-455 m需要8514-10896 ms



Java实现使用一个原始数组

  import static java.util.concurrent.TimeUnit。 

public class LargePrimitiveArray {

public static void main(String [] args){
long start = System.nanoTime();
int [] elements = new int [10000000];
for(int i = 0; i <10000000; i ++){
elements [i] = i;
}
for(int i = 0; i <10000000; i ++){
if(elements [i]%1000000 == 0){
System.out.println (x:+ elements [i]);
}
}
long end = System.nanoTime();
System.out.println(Time:+ MILLISECONDS.convert(end-start,NANOSECONDS));
}
}

需要116ms,JVM大小为59m



Java整数列表

  import java.util.List; 
import java.util.ArrayList;
import static java.util.concurrent.TimeUnit。*;

public class LargeArrayList {

public static void main(String [] args){
long start = System.nanoTime();
List< Integer> elements = new ArrayList< Integer>();
for(int i = 0; i <10000000; i ++){
elements.add(i);
}
for(Integer x:elements){
if(x%1000000 == 0){
System.out.println(x:+ x)
}
}
long end = System.nanoTime();
System.out.println(Time:+ MILLISECONDS.convert(end-start,NANOSECONDS));
}

}



它需要3993毫秒JVM大小283m



我的问题是,为什么是第一个例子如此高效,而第二个是如此严重的影响。



在Mac OS X上运行的所有测试Snow Leopard,
Java 6u26 64位服务器
Scala 2.9.1.final



编辑:



使用LinkedList的实际实现(这是一个比ArrayList更为公正的空间比较,因为正确地指出,scala的List是链接的)

  import java.util.List; 
import java.util.LinkedList;
import static java.util.concurrent.TimeUnit。*;

public class LargeLinkedList {

public static void main(String [] args){
LargeLinkedList test = new LargeLinkedList();
long start = System.nanoTime();
List< Integer> elements = test.createElements();
test.findElementsToPrint(elements);
long end = System.nanoTime();
System.out.println(Time:+ MILLISECONDS.convert(end-start,NANOSECONDS));
}

private List< Integer> createElements(){
List< Integer> elements = new LinkedList< Integer>();
for(int i = 0; i <10000000; i ++){
elements.add(i);
}
return元素;
}

private void findElementsToPrint(List< Integer> elements){
for(Integer x:elements){
if(x%1000000 == 0){ $ b¥b System.out.println(x:+ x);
}
}
}

}

使用480-460 mbs拍摄3621-6749 ms。这更符合第二个scala示例的性能。



finally,一个LargeArrayBuffer

  import collection.mutable.ArrayBuffer 
import java.util.concurrent.TimeUnit
import java.util.concurrent.TimeUnit._

对象LargeArrayBuffer extends App {

def chrono [A ](f:=> A,timeUnit:TimeUnit = MILLISECONDS):(A,Long)= {
val start = System.nanoTime()
val result:A = f
val end = System.nanoTime()
(result,timeUnit.convert(end-start,NANOSECONDS))
}

def百万():List [Int] = {
val size = 10000000
var items = new ArrayBuffer [Int](size)
(0到size).foreach(items + = _)
items.filter(_%1000000 == 0).toList
}

val results = chrono(million())
results._1.foreach(x => println(x:+ x ))
println(Time:+ results._2);
}

约2145 ms和375 mb


$

解决方案

很多事情都在这里!!!

p>

让我们从Java int [] 开始。 Java中的数组是唯一未擦除类型的集合。 int [] 的运行时表示与 Object [] 的运行时表示不同,实际上直接使用 int 。因此,没有涉及使用它的拳击。



在内存中,你在内存中有40.000.000个连续字节,一次读取和写入4个每当一个元素被读取或写入时。



相反, ArrayList< Integer> 几乎任何其他通用集合 - 由40.000.000或80.000.00连续字节组成(分别在32和64位JVM上),PLUS 80.000.000字节以8字节为一组在内存周围扩展。每个读取一个元素的写入都必须经过两个内存空间,当你执行的实际任务是这么快时,处理所有内存的绝对时间是非常重要的。



因此,回到Scala,在第二个例子中,你操作一个 List 。现在,Scala的 List 更像是Java的 LinkedList 比严重错误的 ArrayList List 的每个元素由一个名为 Cons 的对象组成,它有16个字节,和指向另一个列表的指针。因此,10.000.000个元素的 List 由160.000.000个元素组成,以16字节的组为单位分布在内存中,加上组内的所有内容分布在组中的80.000.000字节的8字节。所以 ArrayList 的真实情况对于列表

更是如此。

最后, Range 。 A Range 是一个具有下边界和上边界的整数序列,加上一个步长。 10.000.000元素的 Range 为40个字节:下限和上限的三个int(非通用)和step,加上一些预计算值( last numRangeElements )和另外两个用于 lazy val 只是为了清楚,这是不是 40倍10.000.000:这是40个字节TOTAL。范围的大小完全不相关,因为它不存储个别元素



现在,因为 Range 是一个 Seq [Int] ,它仍然需要经过拳击大多数用途: int 将被转换为整数,然后又回到 int ,这是非常浪费的。



strong>缺陷大小计算



因此,这里是一个暂定的计算。首先,请阅读本文,了解有关对象占用多少内存的一些常规准则。重要的是:


  1. Java对正常对象使用8个字节,对于对象数组使用12个字节,

  2. 对象以8字节块分配。

我实际上认为它是16个字节,而不是8个。反正,缺点也比我想象的小。它的字段是:

  public static final long serialVersionUID; // static,不计数
private java.lang.Object scala $ collection $ immutable $$ colon $ colon $$ hd;
private scala.collection.immutable.List tl;

引用至少 4个字节)。所以我们有:

  8字节Java标头
4字节hd
4字节tl

这使它只有16个字节长。相当不错,实际上。在示例中, hd 将指向一个 Integer 对象,我假设为8个字节长。至于 tl ,它指向另一个缺点,我们已经在计数。



我要修改估计值,尽可能使用实际数据。


I ran a set of performance benchmarks for 10,000,000 elements, and I've discovered that the results vary greatly with each implementation.

Can anybody explain why creating a Range.ByOne, results in performance that is better than a simple array of primitives, but converting that same range to a list results in even worse performance than the worse case scenario?

Create 10,000,000 elements, and print out those that are modulos of 1,000,000. JVM size is always set to same min and max: -Xms?m -Xmx?m

import java.util.concurrent.TimeUnit
import java.util.concurrent.TimeUnit._

object LightAndFastRange extends App {

def chrono[A](f: => A, timeUnit: TimeUnit = MILLISECONDS): (A,Long) = {
  val start = System.nanoTime()
  val result: A = f
  val end = System.nanoTime()
  (result, timeUnit.convert(end-start, NANOSECONDS))
}

  def millions(): List[Int] =  (0 to 10000000).filter(_ % 1000000 == 0).toList

  val results = chrono(millions())
  results._1.foreach(x => println ("x: " + x))
  println("Time: " + results._2);
}

It takes 141 milliseconds with a JVM size of 27m

In comparison, converting to List affects performance dramatically:

import java.util.concurrent.TimeUnit
import java.util.concurrent.TimeUnit._

object LargeLinkedList extends App {
  def chrono[A](f: => A, timeUnit: TimeUnit = MILLISECONDS): (A,Long) = {
  val start = System.nanoTime()
  val result: A = f
  val end = System.nanoTime()
  (result, timeUnit.convert(end-start, NANOSECONDS))
}

  val results = chrono((0 to 10000000).toList.filter(_ % 1000000 == 0))
  results._1.foreach(x => println ("x: " + x))
  println("Time: " + results._2)
}

It takes 8514-10896 ms with 460-455 m

In contrast, this Java implementation uses an array of primitives

import static java.util.concurrent.TimeUnit.*;

public class LargePrimitiveArray {

    public static void main(String[] args){
            long start = System.nanoTime();
            int[] elements = new int[10000000];
            for(int i = 0; i < 10000000; i++){
                    elements[i] = i;
            }
            for(int i = 0; i < 10000000; i++){
                    if(elements[i] % 1000000 == 0) {
                            System.out.println("x: " + elements[i]);
                    }
            }
            long end = System.nanoTime();
            System.out.println("Time: " + MILLISECONDS.convert(end-start, NANOSECONDS));
    }
}

It takes 116ms with JVM size of 59m

Java List of Integers

import java.util.List;
import java.util.ArrayList;
import static java.util.concurrent.TimeUnit.*;

public class LargeArrayList {

    public static void main(String[] args){
            long start = System.nanoTime();
            List<Integer> elements = new ArrayList<Integer>();
            for(int i = 0; i < 10000000; i++){
                    elements.add(i);
            }
            for(Integer x: elements){
                    if(x % 1000000 == 0) {
                            System.out.println("x: " + x);
                    }
            }
            long end = System.nanoTime();
            System.out.println("Time: " + MILLISECONDS.convert(end-start, NANOSECONDS));
    }

}

It takes 3993 ms with JVM size of 283m

My question is, why is the first example so performant, while the second is so badly affected. I tried creating views, but wasn't successful at reproducing the performance benefits of the range.

All tests running on Mac OS X Snow Leopard, Java 6u26 64-Bit Server Scala 2.9.1.final

EDIT:

for completion, here's the actual implementation using a LinkedList (which is a more fair comparison in terms of space than ArrayList, since as rightly pointed out, scala's List are linked)

import java.util.List;
import java.util.LinkedList;
import static java.util.concurrent.TimeUnit.*;

public class LargeLinkedList {

    public static void main(String[] args){
            LargeLinkedList test = new LargeLinkedList();
            long start = System.nanoTime();
            List<Integer> elements = test.createElements();
            test.findElementsToPrint(elements);
            long end = System.nanoTime();
            System.out.println("Time: " + MILLISECONDS.convert(end-start, NANOSECONDS));
    }

    private List<Integer> createElements(){
            List<Integer> elements = new LinkedList<Integer>();
            for(int i = 0; i < 10000000; i++){
                    elements.add(i);
            }
            return elements;
    }

    private void findElementsToPrint(List<Integer> elements){
            for(Integer x: elements){
                    if(x % 1000000 == 0) {
                            System.out.println("x: " + x);
                    }
            }
    }

}

Takes 3621-6749 ms with 480-460 mbs. That's much more in line with the performance of the second scala example.

finally, a LargeArrayBuffer

import collection.mutable.ArrayBuffer
import java.util.concurrent.TimeUnit
import java.util.concurrent.TimeUnit._

object LargeArrayBuffer extends App {

 def chrono[A](f: => A, timeUnit: TimeUnit = MILLISECONDS): (A,Long) = {
  val start = System.nanoTime()
  val result: A = f
  val end = System.nanoTime()
  (result, timeUnit.convert(end-start, NANOSECONDS))
 }

 def millions(): List[Int] = {
    val size = 10000000
    var items = new ArrayBuffer[Int](size)
    (0 to size).foreach (items += _)
    items.filter(_ % 1000000 == 0).toList
 }

 val results = chrono(millions())
  results._1.foreach(x => println ("x: " + x))
  println("Time: " + results._2);
 }

Taking about 2145 ms and 375 mb

Thanks a lot for the answers.

解决方案

Oh So Many Things going on here!!!

Let's start with Java int[]. Arrays in Java are the only collection that is not type erased. The run time representation of an int[] is different from the run time representation of Object[], in that it actually uses int directly. Because of that, there's no boxing involved in using it.

In memory terms, you have 40.000.000 consecutive bytes in memory, that are read and written 4 at a time whenever an element is read or written to.

In contrast, an ArrayList<Integer> -- as well as pretty much any other generic collection -- is composed of 40.000.000 or 80.000.00 consecutive bytes (on 32 and 64 bits JVM respectively), PLUS 80.000.000 bytes spread all around memory in groups of 8 bytes. Every read an write to an element has to go through two memory spaces, and the sheer time spent handling all that memory is significant when the actual task you are doing is so fast.

So, back to Scala, for the second example where you manipulate a List. Now, Scala's List is much more like Java's LinkedList than the grossly misnamed ArrayList. Each element of a List is composed of an object called Cons, which has 16 bytes, with a pointer to the element and a pointer to another list. So, a List of 10.000.000 elements is composed of 160.000.000 elements spread all around memory in groups of 16 bytes, plus 80.000.000 bytes spread all around memory in groups of 8 bytes. So what was true for ArrayList is even more so for List

Finally, Range. A Range is a sequence of integers with a lower and an upper boundary, plus a step. A Range of 10.000.000 elements is 40 bytes: three ints (not generic) for lower and upper bounds and step, plus a few pre-computed values (last, numRangeElements) and two other ints used for lazy val thread safety. Just to make clear, that's NOT 40 times 10.000.000: that's 40 bytes TOTAL. The size of the range is completely irrelevant, because IT DOESN'T STORE THE INDIVIDUAL ELEMENTS. Just the lower bound, upper bound and step.

Now, because a Range is a Seq[Int], it still has to go through boxing for most uses: an int will be converted into an Integer and then back into an int again, which is sadly wasteful.

Cons Size Calculation

So, here's a tentative calculation of Cons. First of all, read this article about some general guidelines on how much memory an object takes. The important points are:

  1. Java uses 8 bytes for normal objects, and 12 for object arrays, for "housekeeping" information (what's the class of this object, etc).
  2. Objects are allocated in 8 bytes chunks. If your object is smaller than that, it will be padded to complement it.

I actually thought it was 16 bytes, not 8. Anyway, Cons is also smaller than I thought. Its fields are:

public static final long serialVersionUID; // static, doesn't count
private java.lang.Object scala$collection$immutable$$colon$colon$$hd;
private scala.collection.immutable.List tl;

References are at least 4 bytes (could be more on 64 bits JVM). So we have:

8 bytes Java header
4 bytes hd
4 bytes tl

Which makes it only 16 bytes long. Pretty good, actually. In the example, hd will point to an Integer object, which I assume is 8 bytes long. As for tl, it points to another cons, which we are already counting.

I'm going to revise the estimates, with actual data where possible.

这篇关于scala范围与列出大型集合的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆