写入“fib”并行运行:-N2较慢? [英] Writing "fib" to run in parallel: -N2 is slower?

查看:116
本文介绍了写入“fib”并行运行:-N2较慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习Haskell并尝试并行执行写代码,但Haskell总是按顺序运行它。当我使用 -N2 运行时标志执行时,执行所需的时间比省略此标志的时间长。



这是代码:

  import Control.Parallel 
import Control.Parallel.Strategies

fib :: Int - > Int
fib 1 = 1
fib 0 = 1
fib n = fib(n - 1)+ fib(n - 2)

fib2 :: Int - > Int
fib2 n = a`par`(b`pseq`(a + b))
其中a = fib n
b = fib n + 1

fib3 :: Int - > Int
fib3 n = runEval $ do
a < - rpar(fib n)
b < - rpar(fib n + 1)
rseq a
rseq b
return(a + b)

main =做putStrLn(show(fib3 40))

我做错了什么?我在Intel Core i5的Windows 7以及Atom的Linux上尝试了这个示例。

这是来自我的控制台会话的日志:

  ghc -rtsopts -threaded -O2 test.hs 
[1的1]编译Main(test.hs,test.o)

test + RTS -s
331160283
在堆中分配的64,496字节
在GC
中复制的2,024字节最大居民地址数为42,888字节(1个样本)
22,648字节最大污水量
使用的总内存量1 MB(由于分段造成的损失0 MB)

0代0代,0代并行,0.00s,0.00s消耗
代1:1集合,0平行,0.00s,0.00s经过

并行GC工作余额:nan(0/0,理想1)

MUT时间(已用)GC时间(经过时间)
任务0(工作人员):0.00s(6.59s)0.00s(0.00s)
任务1(工人):0.00s(0.00s)0.00s (0.00s)
任务2(界限):6.33s(6.59s)0.00s(0.00s)

SPARKS:2(0转换,0修剪)

INIT时间0.00s(经过0.00s)
MUT时间6.33s(经过6.59s)
GC时间0.00s(经过0.00s)
EXIT时间0.00s(经过0.00s)
总时间6.33秒(经过6.59秒)

%GC时间0.0%(已过0.0%)

分配给每个MUT的10,191个字节第二个

效率100.0%用户总数96.0%

gc_alloc_block_sync:0
whitehole_spin:0
gen [0] .sync_large_objects:0
gen [1] .sync_large_objects:0


test + RTS -N2 -s
331160283
在堆中分配的72,688字节
在期间复制的5,644字节GC
最大居民身份28,300字节(1个样本)
24,948字节最大值
2 MB使用的总内存(由于分段而丢失0 MB)

代0:1集合,0平行,0.00s,0.00s经过
代1:1集合,1并行,0.00s,0.01s经过

并行GC工作余额:1.51 (937/621,理想2)

MUT时间(已用)GC时间(已用)
任务0(工人):0.00s(9.29s)0.00s(0.00s)
任务1(工人):4.53s(9.29s)0.00s(0.00s)
任务2(界限):5.84s(9.29s)0.00s(0.01s)
任务3 ):0.00s(9.29s)0.00s(0.00s)

SPARKS:2(1转换,0修剪)

初始时间0.00s(已过去0.00s)
MUT时间10.38s(经过9.29s)
GC时间0.00s(经过0.01s)
EXIT时间0.00s(经过0.00s)
总时间10.38s(经过9.30s) )

%GC时间0.0%(已过0.1%)

分配给每个MUT的7,006个字节第二个

Productiv用户总数的100.0%,已用完总数的111.6%

gc_alloc_block_sync:0
whitehole_spin:0
gen [0] .sync_large_objects:0
gen [1] .sync_large_objects:0


解决方案

我想答案是 GHC会优化fib函数,使其不分配,而
计算不会导致RTS问题,因为
调度程序永远不会运行并执行负载平衡(即$在这篇 >讨论组。此外,我还找到了很好的教程


I'm learning Haskell and trying write code to execute in parallel, but Haskell always runs it sequentially. And when I execute with the -N2 runtime flag it take more time to execute than if I omit this flag.

Here is code:

import Control.Parallel
import Control.Parallel.Strategies

fib :: Int -> Int
fib 1 = 1
fib 0 = 1
fib n = fib (n - 1) + fib (n - 2)

fib2 :: Int -> Int
fib2 n = a `par` (b `pseq` (a+b))
    where a = fib n
          b = fib n + 1

fib3 :: Int -> Int
fib3 n = runEval $ do
                a <- rpar (fib n)
                b <- rpar (fib n + 1)
                rseq a
                rseq b
                return (a+b)

main = do putStrLn (show (fib3 40))

What did I do wrong? I tried this sample in Windows 7 on Intel core i5 and in Linux on Atom.

Here is log from my console session:

ghc -rtsopts -threaded -O2 test.hs
[1 of 1] Compiling Main             ( test.hs, test.o )

test +RTS -s
331160283
          64,496 bytes allocated in the heap
           2,024 bytes copied during GC
          42,888 bytes maximum residency (1 sample(s))
          22,648 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:     0 collections,     0 parallel,  0.00s,  0.00s elapsed
  Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s elapsed

  Parallel GC work balance: nan (0 / 0, ideal 1)

                        MUT time (elapsed)       GC time  (elapsed)
  Task  0 (worker) :    0.00s    (  6.59s)       0.00s    (  0.00s)
  Task  1 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)
  Task  2 (bound)  :    6.33s    (  6.59s)       0.00s    (  0.00s)

  SPARKS: 2 (0 converted, 0 pruned)

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    6.33s  (  6.59s elapsed)
  GC    time    0.00s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    6.33s  (  6.59s elapsed)

  %GC time       0.0%  (0.0% elapsed)

  Alloc rate    10,191 bytes per MUT second

  Productivity 100.0% of total user, 96.0% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync_large_objects: 0
gen[1].sync_large_objects: 0


test +RTS -N2 -s 
331160283
          72,688 bytes allocated in the heap
           5,644 bytes copied during GC
          28,300 bytes maximum residency (1 sample(s))
          24,948 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:     1 collections,     0 parallel,  0.00s,  0.00s elapsed
  Generation 1:     1 collections,     1 parallel,  0.00s,  0.01s elapsed

  Parallel GC work balance: 1.51 (937 / 621, ideal 2)

                        MUT time (elapsed)       GC time  (elapsed)
  Task  0 (worker) :    0.00s    (  9.29s)       0.00s    (  0.00s)
  Task  1 (worker) :    4.53s    (  9.29s)       0.00s    (  0.00s)
  Task  2 (bound)  :    5.84s    (  9.29s)       0.00s    (  0.01s)
  Task  3 (worker) :    0.00s    (  9.29s)       0.00s    (  0.00s)

  SPARKS: 2 (1 converted, 0 pruned)

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   10.38s  (  9.29s elapsed)
  GC    time    0.00s  (  0.01s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   10.38s  (  9.30s elapsed)

  %GC time       0.0%  (0.1% elapsed)

  Alloc rate    7,006 bytes per MUT second

  Productivity 100.0% of total user, 111.6% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync_large_objects: 0
gen[1].sync_large_objects: 0

解决方案

I think answer is that "GHC will optimise the fib function so that it does no allocation, and computations that do no allocation cause problems for the RTS because the scheduler never gets to run and do load-balancing (which is necessary for parallelism)" as wrote Simon in this discussion group. Also I found good tutorial.

这篇关于写入“fib”并行运行:-N2较慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆