频繁的GC防止火花并行运行 [英] Frequent GC preventing sparks from running in parallel
问题描述
我尝试在这里运行第一个示例: http://chimera.labs。 oreilly.com/books/1230000000929/ch03.html
代码: https://github.com/simonmar/parconc-examples/blob/master/strat.hs
import Control.Parallel
import Control.Parallel.Strategies(rpar,Strategy,using)
import Text.Printf
import System。环境
- <<< fib
fib :: Integer - >整数
fib 0 = 1
fib 1 = 1
fib n = fib(n-1)+ fib(n-2)
- >>>
main =打印对
其中
对=
- < pair
(fib 35,fib 36)``使用parPair
- >>
- < parPair ::策略(a,b)
parPair(a,b)= do
a'< - rpar a
b'< - rpar b
return(a',b')
- >>
我使用ghc 7.10.2(在OSX上,使用多核机器)构建,使用以下命令:
ghc -O2 strat.hs -threaded -rtsopts -eventlog
然后运行:
./ strat + RTS -N2 -l -s
我预计2 fibs
计算并行运行(前面的章节示例按预期工作,所以没有设置问题),并且我没有得到任何加速,如下所示:
%./strat + RTS -N2 -l -s
(14930352,24157817)
在堆中分配的3,127,178,800字节
复制6,323,360字节在GC
70,000字节的最大居民身份(2个样本)
31,576字节的最大流量
2 MB使用的总内存(0 MB由于分段丢失)
总时间(已用)平均暂停最大暂停
Gen 0 5963 colls ,5963 par 0.179s 0.074s 0.0000s 0.0001s
Gen 1 2 colls,1 par 0.000s 0.000s 0.0001s 0.0001s
平行GC工作余额:2.34%(连续0%完美100%)
任务:6(1名有约束力,5名高峰工人(共5名),使用-N2)
SPARKS:2(0转换,0溢出,0 dud,1 GC'd,1次失败)
初始时间0.000s(已过0.001s)
MUT时间1.809s(已过1.870s)
GC时间0.180s(0.074 s已过)
退出时间0.000s(已过去0.000s)
总时间1.991s(已过去1.945s)
分配给每个MUT的1,728,514,772个字节秒
生产力总用户的91.0%,已用完总数的93.1%
gc_alloc_block_sync:238
whitehole_spin:0
gen [0] .sync:0
gen [ 1] .sync:0
-N1
得到类似的结果(省略)。
GC集合的#号似乎是可疑的,其他人在#haskell-beginners中,所以我尝试在运行时添加 -A16M
。结果看起来更符合预期:
%./strat + RTS -N2 -l -s -A16M
(14930352,24157817)
在堆中分配的3,127,179,920字节
在GC
期间复制的260,960字节最大居民身份(2个样本)
28,320字节最大值
使用的内存总量33 MB(因分段而丢失0 MB)
总时间(已用)平均暂停最大暂停
Gen 0 115 colls,115 par 0.105s 0.002s 0.0000 s 0.0003s
Gen 1 2 colls,1 par 0.000s 0.000s 0.0002s 0.0002s
平行GC工作余额:71.25%(连续0%,完美100%)
TASKS:6(1个边界,5个高峰工人(共5个),使用-N2)
SPARKS:2(1个转换,0个溢出,0个dud,0个GC'd,1个失败)
初始时间0.001s(已过0.001s)
MUT时间1.579s(已经过1.087s)
GC时间0.106s(经过0.002s)
出口时间0.000s(经过0.000s)
总时间1.686s(经过1.091s)
分配率为每MUT秒1,980,993,138字节
生产率总用户的93.7%,已用完总数的144.8%
gc_alloc_block_sync:27
whitehole_spin:0
gen [0] .sync:0
gen [1] .sync:0
问题是:为什么这是行为?即使频繁使用GC,我仍然直观地预期在其他90%的运行时间内,2个火花并行运行。
是的,这实际上是GHC 8.0.1和更早版本中的一个错误(我正在为8.0.2修复它)。问题在于 fib 35
和 fib 36
表达式是不变的,因此GHC将它们提升到顶层,因为CAF ,并且RTS错误地假设CAFs无法访问,因此垃圾收集火花。
您可以通过传入参数来使表达式变为非常量在命令行上:
main = do
[a,b]< - 地图读取< $> ; getArgs
let pair =(fib a,fib b)`using parPair
print pair
然后用 ./ strat 35 36
运行程序。
I tried running the first example here: http://chimera.labs.oreilly.com/books/1230000000929/ch03.html
Code: https://github.com/simonmar/parconc-examples/blob/master/strat.hs
import Control.Parallel
import Control.Parallel.Strategies (rpar, Strategy, using)
import Text.Printf
import System.Environment
-- <<fib
fib :: Integer -> Integer
fib 0 = 1
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
-- >>
main = print pair
where
pair =
-- <<pair
(fib 35, fib 36) `using` parPair
-- >>
-- <<parPair
parPair :: Strategy (a,b)
parPair (a,b) = do
a' <- rpar a
b' <- rpar b
return (a',b')
-- >>
I've built using ghc 7.10.2 (on OSX, with a multicore machine) using the following command:
ghc -O2 strat.hs -threaded -rtsopts -eventlog
And run using:
./strat +RTS -N2 -l -s
I expected the 2 fibs
calculations to be run in parallel (previous chapter examples worked as expected, so no setup issues), and I wasn't getting any speedup at all, as seen here:
% ./strat +RTS -N2 -l -s
(14930352,24157817)
3,127,178,800 bytes allocated in the heap
6,323,360 bytes copied during GC
70,000 bytes maximum residency (2 sample(s))
31,576 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 5963 colls, 5963 par 0.179s 0.074s 0.0000s 0.0001s
Gen 1 2 colls, 1 par 0.000s 0.000s 0.0001s 0.0001s
Parallel GC work balance: 2.34% (serial 0%, perfect 100%)
TASKS: 6 (1 bound, 5 peak workers (5 total), using -N2)
SPARKS: 2 (0 converted, 0 overflowed, 0 dud, 1 GC'd, 1 fizzled)
INIT time 0.000s ( 0.001s elapsed)
MUT time 1.809s ( 1.870s elapsed)
GC time 0.180s ( 0.074s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 1.991s ( 1.945s elapsed)
Alloc rate 1,728,514,772 bytes per MUT second
Productivity 91.0% of total user, 93.1% of total elapsed
gc_alloc_block_sync: 238
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
-N1
gets similar results (omitted).
The # of GC collections seemed suspicious, as pointed out by others in #haskell-beginners, so I tried adding -A16M
when running. The results looked much more in line with expectations:
% ./strat +RTS -N2 -l -s -A16M
(14930352,24157817)
3,127,179,920 bytes allocated in the heap
260,960 bytes copied during GC
69,984 bytes maximum residency (2 sample(s))
28,320 bytes maximum slop
33 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 115 colls, 115 par 0.105s 0.002s 0.0000s 0.0003s
Gen 1 2 colls, 1 par 0.000s 0.000s 0.0002s 0.0002s
Parallel GC work balance: 71.25% (serial 0%, perfect 100%)
TASKS: 6 (1 bound, 5 peak workers (5 total), using -N2)
SPARKS: 2 (1 converted, 0 overflowed, 0 dud, 0 GC'd, 1 fizzled)
INIT time 0.001s ( 0.001s elapsed)
MUT time 1.579s ( 1.087s elapsed)
GC time 0.106s ( 0.002s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 1.686s ( 1.091s elapsed)
Alloc rate 1,980,993,138 bytes per MUT second
Productivity 93.7% of total user, 144.8% of total elapsed
gc_alloc_block_sync: 27
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
The question is: Why is this the behavior? Even with frequent GC, I still intuitively expect the 2 sparks to run in parallel in the other 90% of the running time.
Yes, this is actually a bug in GHC 8.0.1 and earlier (I'm working on fixing it for 8.0.2). The problem is that the fib 35
and fib 36
expressions are constant and so GHC lifts them to the top level as CAFs, and the RTS was wrongly assuming that the CAFs were unreachable and so garbage collecting the sparks.
You can work around it by making the expressions non-constant by passing in parameters on the command line:
main = do
[a,b] <- map read <$> getArgs
let pair = (fib a, fib b) `using` parPair
print pair
and then run the program with ./strat 35 36
.
这篇关于频繁的GC防止火花并行运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!