从零开始实施pmap。为什么我的实现慢? [英] Implementing pmap from scratch. Why my implementation slow?

查看:219
本文介绍了从零开始实施pmap。为什么我的实现慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Erlang的新人,所以对于培训我试图从头开始执行标准功能。我试图从列表模块创建并行实现map / 2 函数。但是我的实现工作非常慢。可以指出,如果我在执行过程中遇到任何主要的错误:

I'm new to Erlang, so for training I try to implement standard functions from scratch. I've tried to create parallel implementation of map/2 function from lists module. But my implementation works very slow. Could you point me, if I did any principal mistakes in my implementation:

-module( my_pmap ).
-export([ pmap/2 ]).
-export([ map/4, collect/3 ]).

map( F, Value, Indx, SenderPid ) ->
        SenderPid ! { Indx, F( Value ) }.

pmap( F, List ) ->
        CollectorPid = spawn_link( my_pmap, collect, [ length( List ), [], self() ] ),
        lists:foldl(
                fun( X, Indx ) ->
                        spawn_link( my_pmap, map, [ F, X, Indx, CollectorPid ] ),
                        Indx + 1
                end,
                1,
                List ),
        Mapped =
                receive
                        { collected, M } ->
                                M
                end,
        Sorted = lists:sort(
                        fun( { Indx1, _ }, { Indx2, _ } ) ->
                                Indx1 < Indx2
                        end,
                        Mapped ),
        [ Val || { _Indx, Val } <- Sorted ].

collect( 0, List, SenderPid ) ->
        SenderPid ! { collected, List };
collect( N, List, SenderPid ) when N > 0 ->
        receive
                Mapped ->
                        collect( N - 1, [ Mapped | List ], SenderPid )
        end.

这里是测试结果:

1> c(my_pmap).
{ok,my_pmap}
2> timer:tc( my_pmap, pmap, [ fun(X) -> X*X*X*X end, lists:seq( 1, 10000 ) ] ).
{137804,
 [1,16,81,256,625,1296,2401,4096,6561,10000,14641,20736,
  28561,38416,50625,65536,83521,104976,130321,160000,194481,
  234256,279841,331776,390625,456976,531441|...]}
3> timer:tc( lists, map, [ fun(X) -> X*X*X*X end, lists:seq( 1, 10000 ) ] ).   
{44136,
 [1,16,81,256,625,1296,2401,4096,6561,10000,14641,20736,
  28561,38416,50625,65536,83521,104976,130321,160000,194481,
  234256,279841,331776,390625,456976,531441|...]}

您可能已经看到 0,137804秒。 0,044136秒。

谢谢

推荐答案

评论是正确的。问题在于,产卵过程很便宜,但是有成本。将数字乘以三次非常快,产生新过程的开销会导致您的性能下降。

The comments are correct. The problem is that spawning processes are cheap but it does have a cost. Multiplying A number three times is very fast and the overhead of spawning a new process kills your performance.

将列表分割成片段并在单独的进程中处理每个片段可能会要快如果你知道你有8个内核,你可以尝试将它分成8个碎片。像pmap 这样的东西可以在Erlang中实现,但它不是Erlang的优势。像Haskell GHC运行时的系统具有 sparks ,这是一个更好的工具,用于像这样的细粒度并行。此外,这样的乘法是SSE或GPU中的SIMD指令的明显候选者。 Erlang还没有解决这个问题,但GHC又加了一个加载和 repa 这些库是处理这种情况的。

Partitioning the list into fragments and processing each fragment in a separate process will probably be faster. If you know you have 8 cores, you could try to split it in 8 fragments. Things like pmap can be implemented in Erlang, but it is not a strength of Erlang. A system like the Haskell GHC runtime has sparks which is a better tool for fine-grained parallelism like this. Also, multiplying like that is an obvious candidate for either SIMD instructions in SSE or a GPU. Erlang has no solution for this either, but again, GHC has accelerate and repa which are libraries for handling this situation.

另一方面,您可以通过简单地使用进程来处理几个片段,从而在Erlang中获得良好的加速。还要注意,由于通信开销,并行计算通常在低N(如10000)下表现不佳。你需要更大的问题才能获得好处。

On the other hand, you can get a good speedup in Erlang by simply using processes to handle a couple of fragments as hinted. Also note that parallel computation often performs badly at low N (like 10000) because of the communication overhead. You need way larger problems to reap the benefits.

这篇关于从零开始实施pmap。为什么我的实现慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆