什么时候是“种族”?在Perl 6中值得吗? [英] When is "race" worthwhile in Perl 6?

查看:61
本文介绍了什么时候是“种族”?在Perl 6中值得吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



种族 将可迭代的操作自动划分为多个线程。例如,

 (Bool.roll xx 2000).race.sum 

会自动将2000长数组的总和分成4个线程。但是,基准显示,这比种族未受雇。即使您使数组变大,也会发生
即使每个版本的非自动线程版本越来越快,也会发生这种情况。 (自动线程化也会变得更快,但仍然是不使用它的速度的两倍。) / p>

所以问题是:值得使用的原子操作的最小大小是多少?



更新:实际上, hyper 的性能(类似于种族,但有保证的有序结果)似乎越来越糟,至少对于小批量来说还是默认批处理大小(64)的整数倍。相同的出现 race

解决方案

简短的答案: .sum 不够聪明,无法批量计算总和



因此,在此基准测试中,您实际上要做的是设置 HyperSeq / RaceSeq ,但随后不执行任何并行处理:

  dd(Bool.roll xx 2000 )。种族; 
#RaceSeq.new(配置=> HyperConfiguration.new(批次=> 64,度=> 4))

因此,您一直在测量 .hyper / .race 的开销。您现在可以看到,在 HyperSeq上仅实现了 .map .grep / RaceSeq 。如果您给那个做些事情,例如:

 #在单个词中找到第1000个素数线程
$时间perl6 -e'说(^ Inf).grep(* .is-prime).skip(999).head'
实际0m1.731s
用户0m1.780s
sys 0m0.043s

#同时找到第1000个素数
$ time perl6 -e'say(^ Inf).hyper.grep(* .is-prime).skip (999).head'
实际0m0.809s
用户0m2.048s
sys 0m0.060s

如您所见,在这个(小的)示例中,并发版本的速度是非并发版本的2倍以上。但是会使用更多的CPU。



由于 .hyper .race 正常工作,性能略有提高,如您在该图中看到的 。 / p>

其他功能,例如 .sum 可以用于 .hyper / .race 。但是,我暂时暂不考虑这一点,因为我们需要对 .hyper .race :目前,批次无法与主管通讯完成工作的速度。如果我们想让主管进行调整,主管将需要该信息。批处理大小,如果发现默认的批处理大小太小并且我们的开销太大。


race divides operations on an iterable automatically into threads. For instance,

(Bool.roll xx 2000).race.sum

would automatically divide the sum of the 2000-long array into 4 threads. However, benchmarks show that this is much slower than if race were not employed. This happens even if you make the array bigger. This happens even as the non-autothreaded version gets faster and faster with each version. (Auto-threading also gets faster, but is still twice as slow as not using it.)

So the question is: what is the minimum size of the atomic operation that is worthwhile to use? Is the overhead added to the sequential operation fixed or can it be decreased somehow?

Update: in fact, performance of hyper (similar to race, but with guaranteed ordered results) seems to be getting worse with time, at least for small sizes which are nonetheless integer multiples of the default batch size (64). Same happens with race

解决方案

The short answer: .sum isn't smart enough to calculate sums in batches.

So what you're effectively doing in this benchmark, is to set up a HyperSeq / RaceSeq but then not doing any parallel processing:

dd (Bool.roll xx 2000).race;
# RaceSeq.new(configuration => HyperConfiguration.new(batch => 64, degree => 4))

So you've been measuring .hyper / .race overhead. You see, at the moment, only .map and .grep have been implemented on HyperSeq / RaceSeq. If you give that something to do, like:

# find the 1000th prime number in a single thread
$ time perl6 -e 'say (^Inf).grep( *.is-prime ).skip(999).head'
real    0m1.731s
user    0m1.780s
sys     0m0.043s

# find the 1000th prime number concurrently
$ time perl6 -e 'say (^Inf).hyper.grep( *.is-prime ).skip(999).head'
real    0m0.809s
user    0m2.048s
sys     0m0.060s

As you can see, in this (small) example, the concurrent version is more than 2x as fast as the non-concurrent one. But uses more CPU.

Since .hyper and .race got to work correctly, performance has slightly improved, as you can see in this graph.

Other functions, such as .sum could be implemented for .hyper / .race. However, I would hold off on that at the moment, as we will need a small refactor of the way we do .hyper and .race: at the moment, a batch can not communicate back to the "supervisor" how fast it has finished its job. The supervisor needs that information if we want to allow it to adjust e.g. batch-size, if it finds out that the default batch-size is too small and we have too much overhead.

这篇关于什么时候是“种族”?在Perl 6中值得吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆