什么时候是“种族”?在Perl 6中值得吗? [英] When is "race" worthwhile in Perl 6?
问题描述
种族
将可迭代的操作自动划分为多个线程。例如,
(Bool.roll xx 2000).race.sum
会自动将2000长数组的总和分成4个线程。但是,基准显示,这比 所以问题是:值得使用的原子操作的最小大小是多少? 更新:实际上, 简短的答案: 因此,在此基准测试中,您实际上要做的是设置 因此,您一直在测量 如您所见,在这个(小的)示例中,并发版本的速度是非并发版本的2倍以上。但是会使用更多的CPU。 由于 其他功能,例如 would automatically divide the sum of the 2000-long array into 4 threads. However, benchmarks show that this is much slower than if So the question is: what is the minimum size of the atomic operation that is worthwhile to use? Is the overhead added to the sequential operation fixed or can it be decreased somehow? Update: in fact, performance of The short answer: So what you're effectively doing in this benchmark, is to set up a So you've been measuring As you can see, in this (small) example, the concurrent version is more than 2x as fast as the non-concurrent one. But uses more CPU. Since Other functions, such as 这篇关于什么时候是“种族”?在Perl 6中值得吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!种族$的情况要慢得多。 c $ c>未受雇。即使您使数组变大,也会发生 。
即使每个版本的非自动线程版本越来越快,也会发生这种情况。 (自动线程化也会变得更快,但仍然是不使用它的速度的两倍。) / p>
hyper
的性能(类似于种族,但有保证的有序结果)似乎越来越糟,至少对于小批量来说还是默认批处理大小(64)的整数倍。相同的出现 race
.sum
不够聪明,无法批量计算总和
HyperSeq
/ RaceSeq
,但随后不执行任何并行处理:
dd(Bool.roll xx 2000 )。种族;
#RaceSeq.new(配置=> HyperConfiguration.new(批次=> 64,度=> 4))
.hyper
/ .race
的开销。您现在可以看到,在 HyperSeq上仅实现了
/ .map
和 .grep
RaceSeq
。如果您给那个做些事情,例如:
#在单个词中找到第1000个素数线程
$时间perl6 -e'说(^ Inf).grep(* .is-prime).skip(999).head'
实际0m1.731s
用户0m1.780s
sys 0m0.043s
#同时找到第1000个素数
$ time perl6 -e'say(^ Inf).hyper.grep(* .is-prime).skip (999).head'
实际0m0.809s
用户0m2.048s
sys 0m0.060s
.hyper
和 .race
正常工作,性能略有提高,如您在该图中看到的 。 / p>
.sum
可以用于 .hyper
/ .race
。但是,我暂时暂不考虑这一点,因为我们需要对 .hyper
和 .race
:目前,批次无法与主管通讯完成工作的速度。如果我们想让主管进行调整,主管将需要该信息。批处理大小,如果发现默认的批处理大小太小并且我们的开销太大。
race
divides operations on an iterable automatically into threads. For instance,(Bool.roll xx 2000).race.sum
race
were not employed. This happens even if you make the array bigger.
This happens even as the non-autothreaded version gets faster and faster with each version. (Auto-threading also gets faster, but is still twice as slow as not using it.)hyper
(similar to race, but with guaranteed ordered results) seems to be getting worse with time, at least for small sizes which are nonetheless integer multiples of the default batch size (64). Same happens with race
.sum
isn't smart enough to calculate sums in batches.HyperSeq
/ RaceSeq
but then not doing any parallel processing:dd (Bool.roll xx 2000).race;
# RaceSeq.new(configuration => HyperConfiguration.new(batch => 64, degree => 4))
.hyper
/ .race
overhead. You see, at the moment, only .map
and .grep
have been implemented on HyperSeq
/ RaceSeq
. If you give that something to do, like:# find the 1000th prime number in a single thread
$ time perl6 -e 'say (^Inf).grep( *.is-prime ).skip(999).head'
real 0m1.731s
user 0m1.780s
sys 0m0.043s
# find the 1000th prime number concurrently
$ time perl6 -e 'say (^Inf).hyper.grep( *.is-prime ).skip(999).head'
real 0m0.809s
user 0m2.048s
sys 0m0.060s
.hyper
and .race
got to work correctly, performance has slightly improved, as you can see in this graph..sum
could be implemented for .hyper
/ .race
. However, I would hold off on that at the moment, as we will need a small refactor of the way we do .hyper
and .race
: at the moment, a batch can not communicate back to the "supervisor" how fast it has finished its job. The supervisor needs that information if we want to allow it to adjust e.g. batch-size, if it finds out that the default batch-size is too small and we have too much overhead.