并行哈斯克尔 - GHC GC'ing火花 [英] Parallel Haskell - GHC GC'ing sparks

查看:106
本文介绍了并行哈斯克尔 - GHC GC'ing火花的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个我正在尝试并行化的程序(完全粘贴可运行代码此处)。



我已经分析并发现大部分时间都花在 findNearest 上,它本质上是一个简单的 foldr 超过一个大的 Data.Map

  findNearest :: RGB  - > M.Map k RGB  - > (K,Word32)
findNearest RGB M0 =
M.foldrWithKey minDistance才会(K0,距离RGB R0)M0
其中(K0时,r0)= M.findMin M0
minDistance才会krx @(_,d1)=
- RGB空间中的欧几里德距离
let d0 =距离rgb r
in if d0 < d1 then(k,d0)else x

parFindNearest 应该在较大的 Map 的子树上并行执行 findNearest

  parFindNearest :: NFData k => RGB  - > M.Map k RGB  - > (k,Word32)
parFindNearest rgb = minimumBy(比较snd)
。 parMap rdeepseq(findNearest rgb)
。 M.splitRoot

不幸的是,GHC GC在转换成有用的并行性之前,

下面是使用 ghc -O2 -threaded 编译并使用 + RTS -s -N2

 在堆中分配的839,892,616个字节
在GC
中复制的123,999,464个字节5,320,184个字节的最大居住地(19个样本)
3,214,200字节最大污水量
使用的总内存量16 MB(由于碎片造成的0 MB丢失)

总时间(已用)平均暂停最大暂停
Gen 0 1550 colls,1550 par 0.23s 0.11s 0.0001s 0.0004s
Gen 1 19 colls,18 par 0.11s 0.06s 0.0030s 0.0052s

并行GC工作余额:16.48%(连续0%,完美100%)

任务:6(使用-N2的有1名约束,5名高峰工人(共5名))

SPARKS:215623(1318转换,0溢出,0反弹,198111 GC'd,16194失败)

初始时间0.00s(经过0.00s)
MUT时间3.72s(已过时3.66s)
GC时间0.34s(0.17s经过)
EXIT时间0.00s(经过0.00s)
总时间4.07s(经过3.84s)

分配率每MUT 225,726,318个字节秒

总生产力使用者的91.6%,总消逝

gc_alloc_block_sync 97.1%:9862
whitehole_spin:0
根[0] .SYNC:0
根[1 ] .sync:2103

正如您所看到的,大部分火花都是GC'd或失败之前被转换。我尝试过使用不同的严格性,让 findNearest 返回一个自定义严格对数据类型而不是元组
,或者使用中的rdeepseq来实现。 Control.Parallel.Strategies ,但我的火花仍然是GC'd。



我想知道




  • 为什么我的spark会在转换之前被GC'd?
  • 如何改变我的程序以利用并行性?


解决方案

我并不擅长并行策略,所以我可能完全错误。但是:

如果通过设置足够大的分配区域来禁用GC(例如使用 -A20M 运行时选项),那么您会发现大部分火花都是失败的,而不是GC'd。这意味着他们在相应的火花结束之前通过普通的程序流程进行评估。

minimumBy parMap 立即生效,开始评估它们。同时,火花计划和执行,但已为时过晚。火花完成后,该值已由主线程评估。如果没有 -A20M ,则火花是GC'd,因为即使在计划火花之前,该值也会被评估并GC'd。



这是一个简化的测试用例:

  import Control.Parallel.Strategies 

f ::整数 - >整数
f 0 = 1
fn = n * f(n - 1)

main :: IO()
main = do
let l = [ n..n + 10]
n = 1
res = parMap rdeepseq fl
print res

在这种情况下,所有的火花都失败了:

$ p $ SPARKS:11(0转换,0溢出, 0 dud,0 GC'd,11 fizzled)

(有时候是GC'd)



但是如果我在打印结果之前产生主线程,

  import Control .Brochure.Strategies 
import Control.Concurrent

f :: Integer - >整数
f 0 = 1
fn = n * f(n - 1)

main :: IO()
main = do
let l = [ n..n + 10]
n = 1
res = parMap rdeepseq fl
res`seq` threadDelay 1
print res

然后转换所有的火花:

  SPARKS: 11(转换了11次,0次溢出,0次失败,0次GC'd,0次失败)

看起来你没有足够的火花(在我的例子中尝试设置 l = [n..n + 1000] ),并且它们不够重(尝试设置 n = 1000 在我的例子中)。


I have a program I'm trying to parallelize (full paste with runnable code here).

I've profiled and found that the majority of time is spent in findNearest which is essentially a simple foldr over a large Data.Map.

findNearest :: RGB -> M.Map k RGB -> (k, Word32)
findNearest rgb m0 =
    M.foldrWithKey' minDistance (k0, distance rgb r0) m0
    where (k0, r0) = M.findMin m0
          minDistance k r x@(_, d1) =
            -- Euclidean distance in RGB-space
            let d0 = distance rgb r
            in if d0 < d1 then (k, d0) else x

parFindNearest is supposed to execute findNearest in parallel over subtrees of the larger Map.

parFindNearest :: NFData k => RGB -> M.Map k RGB -> (k, Word32)
parFindNearest rgb = minimumBy (comparing snd)
                   . parMap rdeepseq (findNearest rgb)
                   . M.splitRoot

Unfortunately GHC GC's most my sparks before they are converted into useful parallelism.

Here's the result of compiling with ghc -O2 -threaded and running with +RTS -s -N2

 839,892,616 bytes allocated in the heap
 123,999,464 bytes copied during GC
   5,320,184 bytes maximum residency (19 sample(s))
   3,214,200 bytes maximum slop
          16 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1550 colls,  1550 par    0.23s    0.11s     0.0001s    0.0004s
  Gen  1        19 colls,    18 par    0.11s    0.06s     0.0030s    0.0052s

  Parallel GC work balance: 16.48% (serial 0%, perfect 100%)

  TASKS: 6 (1 bound, 5 peak workers (5 total), using -N2)

  SPARKS: 215623 (1318 converted, 0 overflowed, 0 dud, 198111 GC'd, 16194 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    3.72s  (  3.66s elapsed)
  GC      time    0.34s  (  0.17s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    4.07s  (  3.84s elapsed)

  Alloc rate    225,726,318 bytes per MUT second

  Productivity  91.6% of total user, 97.1% of total elapsed

gc_alloc_block_sync: 9862
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 2103

As you can see, the majority of sparks are GC'd or fizzle before being converted. I've tried experimenting with different strictness, having findNearest return a custom strict pair data type instead of a tuple , or using rdeepseq from Control.Parallel.Strategies, but my sparks are still GC'd.

I'd like to know

  • why are my sparks being GC'd before being converted?
  • how can I change my program to take advantage of parallelism?

解决方案

I'm not at expert in parallel strategies, so I may be completely wrong. But:

If you disable GC by setting big enough allocation area (e.g. using -A20M runtime option), you'll see that most of sparks are fizzled, not GC'd. It means they are evaluated by ordinary program flow before the corresponding spark finished.

minimumBy forces parMap results immediately, starting evaluating them. At the same time, sparks are scheduled and executed, but it is too late. When spark finished, the value is already evaluated by the main thread. Without -A20M, sparks are GC'd because the value is evaluated and GC'd even before the spark is scheduled.

Here is a simplified test case:

import Control.Parallel.Strategies

f :: Integer -> Integer
f 0 = 1
f n = n * f (n - 1)

main :: IO ()
main = do
  let l = [n..n+10]
      n = 1
      res = parMap rdeepseq f l
  print res

In that case all the sparks are fizzled:

 SPARKS: 11 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 11 fizzled)

(Some times they are GC'd)

But if I yield main thread before printing results,

import Control.Parallel.Strategies
import Control.Concurrent

f :: Integer -> Integer
f 0 = 1
f n = n * f (n - 1)

main :: IO ()
main = do
  let l = [n..n+10]
      n = 1
      res = parMap rdeepseq f l
  res `seq` threadDelay 1
  print res

Then all the sparks are converted:

SPARKS: 11 (11 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

So, looks like you have not enough sparks (try to set l = [n..n+1000] in my example), and they are not heavy enough (try to set n = 1000 in my example).

这篇关于并行哈斯克尔 - GHC GC'ing火花的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆