.race或.hyper什么时候优于非数据并行版本? [英] When does .race or .hyper outperform non-data-parallelized versions?

查看:58
本文介绍了.race或.hyper什么时候优于非数据并行版本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有此代码:

# Grab Nutrients.csv from https://data.nal.usda.gov/dataset/usda-branded-food-products-database/resource/c929dc84-1516-4ac7-bbb8-c0c191ca8cec
my @nutrients = "/path/to/Nutrients.csv".IO.lines;
for @nutrients.race {
    my @data = $_.split('","');
    .say if @data[2] eq "Protein" and @data[4] > 70 and @data[5] ~~ /^g/;
};

Nutrients.csv是一个174 MB的文件,具有很多行.每行都完成了一些琐碎的工作,但是没有数据依赖性.但是,这大约需要54秒,而非竞赛版本则需要43秒,减少了20%.知道为什么会这样吗?在这里进行的那种操作仍然太少以至于数据并行性无法掌握吗?我已经看到它仅适用于非常繁重的操作,例如检查某些东西是否是主要的.在那种情况下,应该为每条数据做多少工作以使数据并行性值得一试?

Nutrients.csv is a 174 MB file, with lots of rows. Non-trivial stuff is done on every row, but there's no data dependency. However, this takes circa 54s while the non-race version uses 43 seconds, 20% less. Any idea of why that happens? Is the kind of operation done here still too little for data parallelism to take hold? I have seen it only working with very heavy operations, like checking if something is prime. In that case, any ballpark of how much should be done for every piece of data to make data parallelism worth the while?

推荐答案

假设跑赢大市"被定义为使用更少的挂钟":

Assuming that "outperform" is defined as "using less wallclock":

简短的回答:什么时候做.

Short answer: when it does.

更长的答案:当批处理值的开销,分布在多个线程上并收集结果+工作所需的实际CPU除以线程数时,会导致运行时间缩短.

Longer answer: when the overhead of batching values, distributing over multiple threads and collecting results + the actual CPU that is needed for the work divided by the number of threads, results in a shorter runtime.

答案仍然更长:调度程序线程需要一些CPU来批处理值并将工作移交给工作线程,然后处理其结果.只要该CPU数量比完成工作所需的CPU 更多,您将只使用一个线程(因为在调度程序线程准备分派时,唯一的工作线程准备接受更多工作).这意味着您使情况变得更糟,因为现在实际的工作仍由一个线程完成,但是您增加了很多开销和延迟.

Still longer answer: the dispatcher thread needs some CPU to batch up values and hand the work over to a worker thread and then process its result. As long as that amount of CPU is more than the amount of CPU needed to do the work, you will only use one thread (because by the time the dispatcher thread is ready to dispatch, the only worker thread is ready to receive more work). Which means you've made things worse, because the actual work is now still being done by one thread, but you've added a lot of overhead and latency.

因此,请确保工作线程需要完成的工作量足够大,以便调度程序线程将需要启动另一个线程来进行下一个工作.这可以通过增加批量大小来完成.但是更大的批处理也意味着调度程序线程将需要更多的CPU来创建批处理.反过来又可以使工作线程准备好接收下一批,在这种情况下,您又回到了增加开销的位置.

So make sure that the amount of work a worker thread needs to do, is big enough so that the dispatcher thread will need to start up another thread for the next piece of work. This can be done by increasing the batch-size. But a bigger batch, also means that the dispatcher thread will need more CPU to create the batch. Which in turn can make the worker thread be ready to receive the next batch, in which case you're back to just having added overhead.

仍存在使批量大小自动适应工作线程需要完成的工作量的计划.但是不幸的是,这还需要对 hyper race 的当前实现进行大量修改.因此,不要指望很快,而且绝对不要在大调度员大修"降落之前.

There are still plans to make the batch size adapt itself automatically to the amount of work that a worker thread needs to do. But unfortunately, that will also require quite an extensive reworking of the current implementation of hyper and race. So don't expect that any time soon, and definitely not before the Great Dispatcher Overhaul has landed.

这篇关于.race或.hyper什么时候优于非数据并行版本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆