约。 π用于比较java中的Sequential v / s并行速度。为什么.parallel()比较慢? [英] Approx. of π used to compare Sequential v/s Parallel speeds in java. Why .parallel() was slower?
问题描述
有人可以解释一下为什么顺序版π近似比并行版快吗?
我无法弄清楚
我正在使用一个非常着名的π近似示例。我在单位平方((0,0)到(1,1))中选择随机点,看看有多少随机点落在单位圆的区域内。分数应该是π/ 4的值。
公共类PIEstimation {
final static int NUM_SAMPLES = 100000000;
public static void main(String [] args){
sequentialVersion();
parallelVersion();
System.out.println(Real PI:=+ Math.PI);
}
public static void sequentialVersion(){
final long start = System.nanoTime();
最终长计数= LongStream
.rangeClosed(1,NUM_SAMPLES)
.filter(e - > {
double x = Math.random();
double y = Math.random();
返回x * x + y * y< 1;
})。count();
final long duration =((System.nanoTime() - start)/ 1_000_000);
System.out.println(顺序版本:PI~+ 4.0 *(计数/(双)NUM_SAMPLES)+以
+持续时间+msecs计算);
}
public static void parallelVersion(){
final long start = System.nanoTime();
最终长计数= LongStream
.rangeClosed(1,NUM_SAMPLES)
.parallel()
.filter(e - > {
double x = Math.random();
double y = Math.random();
返回x * x + y * y< 1;
})。count();
final long duration =((System.nanoTime() - start)/ 1_000_000);
System.out.println(并行版本:PI~+ 4.0 *(计数/(双)NUM_SAMPLES)+以
+持续时间+msecs计算);
}
}
结果:
顺序版本:PI~3.14176568以4893 msecs计算
并行版本:PI~3.1417546以12044 msecs计算
Real PI := 3.141592653589793
我得到更糟糕的结果并行运行在我的机器上(3.0 GHz Intel Core i7,两个内核,四个线程):
顺序:PI~3.14175124以4952 msecs计算
parallel:PI~3.14167776以21320 msecs计算
我怀疑主要原因是 Math.random()
是线程安全的,因此它会在每次调用时同步。由于有多个线程都在尝试同时获取随机数,因此它们都争用同一个锁。这增加了大量的开销。请注意 <$ c的规范$ c> Math.random() 说明如下:
此方法已正确同步允许多个线程正确使用。但是,如果许多线程需要以很高的速率生成伪随机数,它可能会减少每个线程争用自己的伪随机数生成器。
为避免锁争用,请改为使用 ThreadLocalRandom
:
long count = LongStream.rangeClosed(1,NUM_SAMPLES)
.parallel()
.filter(e - > {
ThreadLocalRandom cur = ThreadLocalRandom.current();
double x = cur.nextDouble();
double y = cur.nextDouble();
返回x * x + y * y< 1;
})
.count() ;
这会得到以下结果:
sequential2:PI~3.14169156以1171 msecs计算
parallel2:PI~3.14166796以648 msecs计算
这是1.8倍的加速,对于双核机器来说也不算太糟糕。请注意,顺序运行时也会更快,可能是因为根本没有锁定开销。
除此之外:通常对于基准测试我建议使用JMH。然而,这个基准似乎运行得足够长,以便给出相对速度的合理指示。但是,为了获得更精确的结果,我建议使用JMH。
UPDATE
以下是其他结果(用户3666197在评论中请求),使用 NUM_SAMPLES
值 1_000_000_000
与原始 100_000_000
。我复制了上面的结果以便于比较。
NUM_SAMPLES = 100_000_000
顺序:PI ~3.14175124以4952 msecs计算
parallel:PI~3.14167776以21320 msecs计算
顺序2:PI~3.14169156以1171 msecs计算
parallel2:PI~3.14166796以648 msecs计算
NUM_SAMPLES = 1_000_000_000
顺序:PI~3.141572896以47730 msecs计算
parallel:PI~3.141543836以228969 msecs计算
顺序2:PI~3.1414865以12843 msecs $计算b $ b parallel2:PI~3.141635704以7953 msecs计算
顺序
和 parallel
结果(大部分)与问题中的代码相同, sequential2
和 parallel2
正在使用我修改过的 ThreadLocalRandom
代码。正如人们所预料的那样,新的时间总计大约延长了10倍。较长的 parallel2
运行速度并不像人们预期的那么快,尽管它并不完全脱节,显示在双核机器上加速1.6倍。 / p>
Can someone please explain me why the sequential version π-approximation was faster than the parallel one?
I can't figure it out
I'm playing around with using a very well-known π-approximation example. I pick random points in the unit square ( ( 0, 0 ) to ( 1, 1 ) ) and see how many of random points do fall inside the area of unit circle. The fraction should be the value of π / 4.
public class PIEstimation {
final static int NUM_SAMPLES = 100000000;
public static void main(String[] args) {
sequentialVersion();
parallelVersion();
System.out.println(" Real PI:= " + Math.PI);
}
public static void sequentialVersion() {
final long start = System.nanoTime();
final long count = LongStream
.rangeClosed(1, NUM_SAMPLES)
.filter(e -> {
double x = Math.random();
double y = Math.random();
return x * x + y * y < 1;
}).count();
final long duration = ((System.nanoTime() - start) / 1_000_000);
System.out.println("Sequential Version: PI ~ " + 4.0 * (count / (double) NUM_SAMPLES) + " calculated in "
+ duration + " msecs");
}
public static void parallelVersion() {
final long start = System.nanoTime();
final long count = LongStream
.rangeClosed(1, NUM_SAMPLES)
.parallel()
.filter(e -> {
double x = Math.random();
double y = Math.random();
return x * x + y * y < 1;
}).count();
final long duration = ((System.nanoTime() - start) / 1_000_000);
System.out.println(" Parallel Version: PI ~ " + 4.0 * (count / (double) NUM_SAMPLES) + " calculated in "
+ duration + " msecs");
}
}
The results:
Sequential Version: PI ~ 3.14176568 calculated in 4893 msecs
Parallel Version: PI ~ 3.1417546 calculated in 12044 msecs
Real PI:= 3.141592653589793
I get even worse results running in parallel on my machine (3.0 GHz Intel Core i7, two cores, four threads):
sequential: PI ~ 3.14175124 calculated in 4952 msecs
parallel: PI ~ 3.14167776 calculated in 21320 msecs
I suspect the main reason is that Math.random()
is thread-safe, and so it synchronizes around every call. Since there are multiple threads all trying to get random numbers at the same time, they're all contending for the same lock. This adds a tremendous amount of overhead. Note that the specification for Math.random()
says the following:
This method is properly synchronized to allow correct use by more than one thread. However, if many threads need to generate pseudorandom numbers at a great rate, it may reduce contention for each thread to have its own pseudorandom-number generator.
To avoid lock contention, use ThreadLocalRandom
instead:
long count = LongStream.rangeClosed(1, NUM_SAMPLES)
.parallel()
.filter(e -> {
ThreadLocalRandom cur = ThreadLocalRandom.current();
double x = cur.nextDouble();
double y = cur.nextDouble();
return x * x + y * y < 1;
})
.count();
This gives the following results:
sequential2: PI ~ 3.14169156 calculated in 1171 msecs
parallel2: PI ~ 3.14166796 calculated in 648 msecs
which is 1.8x speedup, not too bad for a two-core machine. Note that this is also faster when run sequentially, probably because there's no lock overhead at all.
Aside: Normally for benchmarks I'd suggest using JMH. However, this benchmark seems to run long enough that it gives a reasonable indication of relative speeds. For more precise results, though, I do recommend using JMH.
UPDATE
Here are additional results (requested by user3666197 in comments), using a NUM_SAMPLES
value of 1_000_000_000
compared to the original 100_000_000
. I've copied the results from above for easy comparison.
NUM_SAMPLES = 100_000_000
sequential: PI ~ 3.14175124 calculated in 4952 msecs
parallel: PI ~ 3.14167776 calculated in 21320 msecs
sequential2: PI ~ 3.14169156 calculated in 1171 msecs
parallel2: PI ~ 3.14166796 calculated in 648 msecs
NUM_SAMPLES = 1_000_000_000
sequential: PI ~ 3.141572896 calculated in 47730 msecs
parallel: PI ~ 3.141543836 calculated in 228969 msecs
sequential2: PI ~ 3.1414865 calculated in 12843 msecs
parallel2: PI ~ 3.141635704 calculated in 7953 msecs
The sequential
and parallel
results are (mostly) the same code as in the question, and sequential2
and parallel2
are using my modified ThreadLocalRandom
code. The new timings are overall roughly 10x longer, as one would expect. The longer parallel2
run isn't quite as fast as one would expect, though it's not totally out of line, showing about a 1.6x speedup on a two-core machine.
这篇关于约。 π用于比较java中的Sequential v / s并行速度。为什么.parallel()比较慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!