为什么这个程序似乎不能正确融合? [英] Why this program seems not to be fusing properly?

查看:123
本文介绍了为什么这个程序似乎不能正确融合?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我怀疑一个给定的程序并没有像以前那样融合,而是通过这个测试来确认:

  module主要在哪里

导入限定的Data.Vector.Unboxed为V
$ b $ main main :: IO()
main = do

let size = 100000000 :: Int
let array = V.replicate size 0 :: V.Vector Int
let incAll = V.map(+ 1)

print
。 V.sum

。包含
。包含
。包含
。包含
。包含
。包含
。包含
。包含
。包含
。包含
。包含
。包含
。包含
。包含
。包含
。包括

。包含
。包含
。包含
。包含
。包含
。包含
。包含
。包含
。包含
。包含
。包含
。包含
。包含
。包含
。包含
。 incAll

$ array

更多 incAll s,那么程序变得越不有效,我相信这意味着数据流融合不会进行。我使用GHC 8.0.1,使用堆栈构建它,并且包含了 -O2 放在 .cabal ghc-options 上。我错过了什么?

解决方案

注意:我在Windows上使用GHC 7.10.3和stack 1.1.2 (x64),所以您的时间可能会有所不同。



TL; DR



如果您想使用流融合,请内嵌您的函数。



如何融合流



流融合依赖大量使用优化器和重写规则,至少使用vector包。所以让我们来看看你的程序的哪个版本进行了优化。



最小版本(1 incAll



让我们从简单开始。我们首先将程序减少到最低限度:

   -  SOBase.hs 
module Main其中

导入合格的Data.Vector.Unboxed为V

main :: IO()
main = do

let size = 100000000 :: Int
let array = V.replicate size 0 :: V.Vector Int
let incAll = V.map(+ 1)

print
。 V.sum
。 incAll
$ array

让我们编译它并转储GHC生成的核心:

  $ stack ghc --package vector  -  -O2 SOBase.hs -ddump-simpl -dsuppress-all 

main2
main2 =
case(runSTRep main3)`cast` ...
of _ {Vector ipv_s6b2 ipv1_s6b3 ipv2_s6b4 - >
letrec {
$ s $ wfoldlM'_loop_s9wM
$ s $ wfoldlM'_loop_s9wM =
\ sc_s9wK sc1_s9wL - >
案例tagToEnum#(> =#sc1_s9wL ipv1_s6b3)of _ {
False - >
case indexIntArray#ipv2_s6b4(+#ipv_s6b2 sc1_s9wL)
wild_a5ju {__DEFAULT - >
$ s $ wfoldlM'_loop_s9wM(+#sc_s9wK(+#wild_a5ju 1))(+#sc1_s9wL 1)
};
True - > sc_s9wK
}; } in
case $ s $ wfoldlM'_loop_s9wM 0 0 of ww_s94k {__DEFAULT - >
case $ wshowSignedInt 0 ww_s94k([])
of _ {(#ww5_a5fH,ww6_a5fI#) - >
:ww5_a5fH ww6_a5fI
}
}
}

让我们更漂亮一点:

  main2 = let foldLoop sn ​​
| n< size = foldLoop(s +(vec!n + 1))(n + 1)
|否则=打印中的
(foldLoop 0 0)

incAll 已被内联到函数中:

  case indexIntArray#ipv2_s6b4(+#ipv_s6b2 sc1_s9wL)
of wild_a5ju {__DEFAULT - >
$ s $ wfoldlM'_loop_s9wM(+#sc_s9wK(+#wild_a5ju 1))(+#sc1_s9wL 1)
^^^^^^^^^^^^^^^



更多致电(3 incAll s)



让我们更经常地使用 incAll

   -  SO3.hs 
module Main其中

导入限定的Data.Vector.Unboxed为V

main :: IO()
main = do

let size = 100000000 :: Int
let array = V.replicate size 0 :: V.Vector Int
let incAll = V.map(+ 1 )

打印
。 V.sum

。包含
。包含
。 incAll

$ array

现在我们的核心包含什么?

  $ wincAll 
$ wincAll =
\ ww_s999 ww1_s99a ww2_s99b - >
runSTRep
(\ @ s_a4Rs s1_a4Rt - >
案例tagToEnum#(<#ww1_s99a 0)of _ {
False - >
case divInt# 9223372036854775807 8 of ww4_a5fa {__DEFAULT - >
case tagToEnum#(&#; ww1_s99a ww4_a5fa)_ {
False - >
case newByteArray#(*#ww1_s99a 8)(s1_a4Rt` ...)
{_(#ipv_a5dy,ipv1_a5dz#) - >
letrec {
$ s $ wa_s9DR
$ s $ wa_s9DR =
\ sc_s9DN sc1_s9DO sc2_s9DQ - >
case _gt $ _ $ b $ case $ tag $ _ $ case b $ _ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ b of wild_a5jF {__DEFAULT - >
case writeIntArray#
ipv1_a5dz sc_s9DN(+#wild_a5jF 1)(sc2_s9DQ`cast` ...)
s'#_ a6Cg {__DEFAULT - >
$ s $ wa_s9DR(+#sc_s9DN 1)(+#sc1_s9DO 1)(s'#_ a6Cg`cast` ...)
}
};
True - > (#sc2_s9DQ,I#sc_s9DN#)
}; } in
case $ s $ wa_s9DR 0 0(ipv_a5dy`cast` ...)
of _ {(#ipv6_a4Nw,ipv7_a4Nx#) - >
案例ipv7_a4Nx of _ {I#dt4_a5gC - >
case unsafeFreezeByteArray#ipv1_a5dz(ipv6_a4Nw`cast` ...)_ {(#ipv2_a52B,ipv3_a52C#) -
- >
(#ipv2_a52B`cast` ...,
(Vector 0 dt4_a5gC ipv3_a52C)`cast` ...#)
}
}
}
};
True - > case_main4 ww1_s99a wild_00 {}
}
};
True - > case main3 ww1_s99a wild_00 {}
})

....

main2
main2 =
case(runSTRep main5)` cast` ...
of _ {Vector ww1_s991 ww2_s992 ww3_s993 - >
case($ wincAll ww1_s991 ww2_s992 ww3_s993)`cast` ...
- ^^^^^^^^ oh {b $ b of _ {Vector ww5_X99T ww6_X99V ww7_X99X - >
case($ wincAll ww5_X99T ww6_X99V ww7_X99X)`cast` ...
- ^^^^^^^^ oh
of _ {Vector ww9_X99Y ww10_X9a0 ww11_X9a2 - >
case($ wincAll ww9_X99Y ww10_X9a0 ww11_X9a2)`cast` ...
- ^^^^^^^^ oh {b $ b of _ {Vector ipv_s6cG ipv1_s6cH ipv2_s6cI - >
letrec {
$ s $ wfoldlM'_loop_s9Du
$ s $ wfoldlM'_loop_s9Du =
\ sc_s9Ds sc1_s9Dt - >
案例tagToEnum#(> =#sc1_s9Dt ipv1_s6cH)of _ {
False - >
case indexIntArray#ipv2_s6cI(+#ipv_s6cG sc1_s9Dt)
wild_a5jx {__DEFAULT - >
$ s $ wfoldlM'_loop_s9Du(+#sc_s9Ds wild_a5jx)(+#sc1_s9Dt 1)
};
True - > sc_s9Ds
}; } in
case $ s $ wfoldlM'_loop_s9Du 0 of ww12_s99s {__DEFAULT - >
case $ wshowSignedInt 0 ww12_s99s([])
of _ {(#ww14_a5fK,ww15_a5fL#) - >
:ww14_a5fK ww15_a5fL
}
}
}
}
}
}
pre>

该函数不再内联!内联函数(3 incAll s)由于它没有内联,所以流融合无法启动。

让我们添加一个INLINE附注:

   -  SO3I.hs 
模块Main其中

导入限定的Data.Vector.Unboxed为V

main :: IO()
main = do

let size = 100000000 :: Int
let array = V.replicate size 0 :: V.Vector Int
let { - #INLINE incAll# - }
incAll = V.map (+1)
打印
。 V.sum

。包含
。包含
。 incAll

$ array





  stack ghc --package vector  -  -O2 -ddump-simpl SO3I.hs 

main 现在如何?

  main2 
main2 =
case(runSTRep main3)`cast` ...
of _ {Vector ipv_s6bG ipv1_s6bH ipv2_s6bI - >
letrec {
$ s $ wfoldlM'_loop_s9z7
$ s $ wfoldlM'_loop_s9z7 =
\ sc_s9z5 sc1_s9z6 - >
案例tagToEnum#(> =#sc1_s9z6 ipv1_s6bH)of _ {
False - >
case indexIntArray#ipv2_s6bI(+#ipv_s6bG sc1_s9z6)
wild_a5jC {__DEFAULT - >
$ s $ wfoldlM'_loop_s9z7
(+#sc_s9z5(+#(+#(+#wild_a5jC 1)1)1))(+#sc1_s9z6 1)
};
True - > sc_s9z5
}; } in
case $ s $ wfoldlM'_loop_s9z7 0 ww_s96F {__DEFAULT - >
case $ wshowSignedInt 0 ww_s96F([])
of _ {(#ww5_a5fP,ww6_a5fQ#) - >
:ww5_a5fP ww6_a5fQ
}
}
}

大。 incAll 已被内联,如此处所示:

 (+ #sc_s9z5(+#(+#(+#wild_a5jC 1)1)1))(+#sc1_s9z6 1)
^ ^ ^

所以问题在于 incAll 没有内联,所以你没有以$ b $结尾b

  V.sum。 V.map(+1)。 V.map(+1)。 V.map(+1)



您的原始程序(现在内置,32 incAll s)



最后但并非最不重要的是,让我们再次尝试您的原始程序,这次是内联。一切都修好了吗?让我们来看看核心:

  main2 
main2 =
case(runSTRep main3)`cast `...
of _ {Vector ipv_s6xF ipv1_s6xG ipv2_s6xH - >
letrec {
$ s $ wfoldlM'_loop_sajT
$ s $ wfoldlM'_loop_sajT =
\ sc_sajR sc1_sajS - >
case __
False - > $(> =#sc1_sajS ipv1_s6xG)caseTo =
case indexIntArray#ipv2_s6xH(+#ipv_s6xF sc1_sajS)
wild_a5mq {__DEFAULT - >
$ s $ wfoldlM'_loop_sajT
(+#
sc_sajR
(+#
(+#
(+#
(+
(+#
(+#
(+#
(+#
(+#
(+#
(+#
(+#
(+#
(+#
(+#
(+#
(+#
(+#
(+#
(+#
(+#
(+#
(+#
(+#
+#
(+#
(+#
(+#
(+#
(+#
(+#
(+#
wild_a5mq
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1)
1))
(+#sc1_sajS 1)
};
True - > sc_sajR
}; } in
case $ s $ wfoldlM'_loop_sajT 0 0 ww_s9Rr {__DEFAULT - >
case $ wshowSignedInt 0 ww_s9Rr([])$ {$#($ ww5_a5iD,ww6_a5iE#) - >
- >
:ww5_a5iD ww6_a5iE
}
}
}

好的,是的。但是GHC不够聪明,不能放入(+ 1)。 (+1)(+ 2)等等。它实际上更快吗?

  $ stack ghc --package vector  -  -O2 SO.hs&& SO.exe + RTS -s 
在堆中分配的26,400,052,464字节
在GC
中复制的9,736字节800,026,736字节的最大居民地址(2个样本)
61,328字节最大值
1527 MB使用的总内存(由于分段造成0 MB丢失)

总时间(已用)平均暂停最大值暂停
Gen 0 32 colls,0 par 0.000s 0.000s 0.0000 s 0.0000s
Gen 1 2 colls,0 par 0.00s 0.089s 0.044ss 0.0890s

初始时间0.000s(经过0.000s)
MUT时间4.453s(4.616s已过)
GC时间0.000s(经过0.090s)
出口时间0.000s(经过0.089s)
总时间4.453s(已过4.795s)

%GC时间0.0%(已过1.9% )

分配给每个MUT的5,928,432,834个字节第二个

生产力总用户的100.0%,已用完总数的92.9%




您的原始程序需要4秒钟。并且对于内联的?


  $ stack ghc --package vector  -  -O2 SOFixed.hs&& SOFixed.exe + RTS -s 
3200000000
在堆中分配的800,048,112字节
在GC
中复制的4,352字节最大居民身份(1个样本)
42,664字节18,776字节最大污水量
764 MB使用的总内存量(由于分段造成的损失0 MB)

总时间(已用)平均暂停最大值暂停
第0代1个colls,0 par 0.000s 0.000s 0.0000s 0.0000s
Gen 1 1 colls,0 par par 0.000s 0.045s 0.0452s 0.0452s

初始时间0.000s(已过0.000s)
MUT时间0.188s(经过0.224s)
GC时间0.000s(已过0.045s)
出口时间0.000s(已过0.045s)
总时间0.188s(已过0.315s)

%GC时间0.0%(已用完14.4%)

分配给每个MUT的4,266,923,264个字节第二个

生产力总用户的100.0%,总数的59.6%经过

0.1秒。大!顺便说一下,所有(+ 1)调用都被优化成单个 addq $ 32,... 行。


I was under suspect that a given program wasn't fusing as it would and made this test to confirm:

module Main where

import qualified Data.Vector.Unboxed as V

main :: IO ()
main = do

  let size = 100000000 :: Int
  let array = V.replicate size 0 :: V.Vector Int
  let incAll = V.map (+ 1)

  print 
    . V.sum 

    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 

    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 
    . incAll 

    $ array

The more incAlls you add, the less efficient the program becomes, which, I believe, means stream fusion isn't kicking in. I'm using GHC 8.0.1, building it with stack, and I've included -O2 on .cabal's ghc-options. Am I missing something?

解决方案

Note: I'm using GHC 7.10.3 and stack 1.1.2 on Windows (x64), so your times might differ.

TL;DR

Make sure to inline your functions if you want to use stream fusion.

How to fuse a stream

The stream fusion relies heavily on the optimizer and rewrite rules, at least with the vector package. So let's check which versions of your program are optimized well.

Minimal version (1 incAll)

Let's start simple. We start by reducing the program to the minimum:

-- SOBase.hs
module Main where

import qualified Data.Vector.Unboxed as V

main :: IO ()
main = do

  let size = 100000000 :: Int
  let array = V.replicate size 0 :: V.Vector Int
  let incAll = V.map (+ 1)

  print 
    . V.sum     
    . incAll    
    $ array

Let's compile it and dump GHC's generated core:

$ stack ghc --package vector -- -O2 SOBase.hs -ddump-simpl -dsuppress-all

main2
main2 =
  case (runSTRep main3) `cast` ...
  of _ { Vector ipv_s6b2 ipv1_s6b3 ipv2_s6b4 ->
  letrec {
    $s$wfoldlM'_loop_s9wM
    $s$wfoldlM'_loop_s9wM =
      \ sc_s9wK sc1_s9wL ->
        case tagToEnum# (>=# sc1_s9wL ipv1_s6b3) of _ {
          False ->
            case indexIntArray# ipv2_s6b4 (+# ipv_s6b2 sc1_s9wL)
            of wild_a5ju { __DEFAULT ->
            $s$wfoldlM'_loop_s9wM (+# sc_s9wK (+# wild_a5ju 1)) (+# sc1_s9wL 1)
            };
          True -> sc_s9wK
        }; } in
  case $s$wfoldlM'_loop_s9wM 0 0 of ww_s94k { __DEFAULT ->
  case $wshowSignedInt 0 ww_s94k ([])
  of _ { (# ww5_a5fH, ww6_a5fI #) ->
  : ww5_a5fH ww6_a5fI
  }
  }
  }

Let's make that a little bit prettier:

main2 = let foldLoop s n 
              | n < size  = foldLoop (s + (vec ! n + 1)) (n + 1)
              | otherwise = s
        in print (foldLoop 0 0)

The incAll has been inlined into the function:

case indexIntArray# ipv2_s6b4 (+# ipv_s6b2 sc1_s9wL)
                of wild_a5ju { __DEFAULT ->
                $s$wfoldlM'_loop_s9wM (+# sc_s9wK (+# wild_a5ju 1)) (+# sc1_s9wL 1)
                                                  ^^^^^^^^^^^^^^^^

More calls (3 incAlls)

Let's use incAll more often:

 -- SO3.hs
module Main where

import qualified Data.Vector.Unboxed as V

main :: IO ()
main = do

  let size = 100000000 :: Int
  let array = V.replicate size 0 :: V.Vector Int
  let incAll = V.map (+ 1)

  print
    . V.sum

    . incAll
    . incAll
    . incAll

    $ array

What does our core contain now?

$wincAll
$wincAll =
  \ ww_s999 ww1_s99a ww2_s99b ->
    runSTRep
      (\ @ s_a4Rs s1_a4Rt ->
         case tagToEnum# (<# ww1_s99a 0) of _ {
           False ->
             case divInt# 9223372036854775807 8 of ww4_a5fa { __DEFAULT ->
             case tagToEnum# (># ww1_s99a ww4_a5fa) of _ {
               False ->
                 case newByteArray# (*# ww1_s99a 8) (s1_a4Rt `cast` ...)
                 of _ { (# ipv_a5dy, ipv1_a5dz #) ->
                 letrec {
                   $s$wa_s9DR
                   $s$wa_s9DR =
                     \ sc_s9DN sc1_s9DO sc2_s9DQ ->
                       case tagToEnum# (>=# sc1_s9DO ww1_s99a) of _ {
                         False ->
                           case indexIntArray# ww2_s99b (+# ww_s999 sc1_s9DO)
                           of wild_a5jF { __DEFAULT ->
                           case writeIntArray#
                                  ipv1_a5dz sc_s9DN (+# wild_a5jF 1) (sc2_s9DQ `cast` ...)
                           of s'#_a6Cg { __DEFAULT ->
                           $s$wa_s9DR (+# sc_s9DN 1) (+# sc1_s9DO 1) (s'#_a6Cg `cast` ...)
                           }
                           };
                         True -> (# sc2_s9DQ, I# sc_s9DN #)
                       }; } in
                 case $s$wa_s9DR 0 0 (ipv_a5dy `cast` ...)
                 of _ { (# ipv6_a4Nw, ipv7_a4Nx #) ->
                 case ipv7_a4Nx of _ { I# dt4_a5gC ->
                 case unsafeFreezeByteArray# ipv1_a5dz (ipv6_a4Nw `cast` ...)
                 of _ { (# ipv2_a52B, ipv3_a52C #) ->
                 (# ipv2_a52B `cast` ...,
                    (Vector 0 dt4_a5gC ipv3_a52C) `cast` ... #)
                 }
                 }
                 }
                 };
               True -> case main4 ww1_s99a of wild_00 { }
             }
             };
           True -> case main3 ww1_s99a of wild_00 { }
         })

....

main2
main2 =
  case (runSTRep main5) `cast` ...
  of _ { Vector ww1_s991 ww2_s992 ww3_s993 ->
  case ($wincAll ww1_s991 ww2_s992 ww3_s993) `cast` ...
--      ^^^^^^^^ oh
  of _ { Vector ww5_X99T ww6_X99V ww7_X99X ->
  case ($wincAll ww5_X99T ww6_X99V ww7_X99X) `cast` ...
--      ^^^^^^^^ oh
  of _ { Vector ww9_X99Y ww10_X9a0 ww11_X9a2 ->
  case ($wincAll ww9_X99Y ww10_X9a0 ww11_X9a2) `cast` ...
--      ^^^^^^^^ oh
  of _ { Vector ipv_s6cG ipv1_s6cH ipv2_s6cI ->
  letrec {
    $s$wfoldlM'_loop_s9Du
    $s$wfoldlM'_loop_s9Du =
      \ sc_s9Ds sc1_s9Dt ->
        case tagToEnum# (>=# sc1_s9Dt ipv1_s6cH) of _ {
          False ->
            case indexIntArray# ipv2_s6cI (+# ipv_s6cG sc1_s9Dt)
            of wild_a5jx { __DEFAULT ->
            $s$wfoldlM'_loop_s9Du (+# sc_s9Ds wild_a5jx) (+# sc1_s9Dt 1)
            };
          True -> sc_s9Ds
        }; } in
  case $s$wfoldlM'_loop_s9Du 0 0 of ww12_s99s { __DEFAULT ->
  case $wshowSignedInt 0 ww12_s99s ([])
  of _ { (# ww14_a5fK, ww15_a5fL #) ->
  : ww14_a5fK ww15_a5fL
  }
  }
  }
  }
  }
  }

The function is not inlined anymore! Since it isn't inlined, the stream fusion cannot kick in.

Inlining the function (3 incAlls)

Let's add an INLINE pragma:

-- SO3I.hs
module Main where

import qualified Data.Vector.Unboxed as V

main :: IO ()
main = do

  let size = 100000000 :: Int
  let array = V.replicate size 0 :: V.Vector Int
  let {-# INLINE incAll #-}
      incAll = V.map (+1)
  print 
    . V.sum 

    . incAll 
    . incAll 
    . incAll 

    $ array

stack ghc --package vector -- -O2 -ddump-simpl SO3I.hs

How does the main now look like?

main2                                                                         
main2 =                                                                       
  case (runSTRep main3) `cast` ...                                            
  of _ { Vector ipv_s6bG ipv1_s6bH ipv2_s6bI ->                               
  letrec {                                                                    
    $s$wfoldlM'_loop_s9z7                                                     
    $s$wfoldlM'_loop_s9z7 =                                                   
      \ sc_s9z5 sc1_s9z6 ->                                                   
        case tagToEnum# (>=# sc1_s9z6 ipv1_s6bH) of _ {                       
          False ->                                                            
            case indexIntArray# ipv2_s6bI (+# ipv_s6bG sc1_s9z6)              
            of wild_a5jC { __DEFAULT ->                                       
            $s$wfoldlM'_loop_s9z7                                             
              (+# sc_s9z5 (+# (+# (+# wild_a5jC 1) 1) 1)) (+# sc1_s9z6 1)     
            };                                                                
          True -> sc_s9z5                                                     
        }; } in                                                               
  case $s$wfoldlM'_loop_s9z7 0 0 of ww_s96F { __DEFAULT ->                    
  case $wshowSignedInt 0 ww_s96F ([])                                         
  of _ { (# ww5_a5fP, ww6_a5fQ #) ->                                          
  : ww5_a5fP ww6_a5fQ                                                         
  }                                                                           
  }                                                                           
  }                                                                           

Great. incAll has been inlined, as can be seen here:

(+# sc_s9z5 (+# (+# (+# wild_a5jC 1) 1) 1)) (+# sc1_s9z6 1)     
                                  ^  ^  ^

So the problem was that incAll wasn't inlined, therefore you didn't end up with

V.sum . V.map (+1) . V.map (+1) . V.map (+1)

Your original program (now inlined, 32 incAlls)

Last but not least, let's try your original program again, this time with inline. Is everything fixed? Let's have a look at the core:

main2
main2 =
  case (runSTRep main3) `cast` ...
  of _ { Vector ipv_s6xF ipv1_s6xG ipv2_s6xH ->
  letrec {
    $s$wfoldlM'_loop_sajT
    $s$wfoldlM'_loop_sajT =
      \ sc_sajR sc1_sajS ->
        case tagToEnum# (>=# sc1_sajS ipv1_s6xG) of _ {
          False ->
            case indexIntArray# ipv2_s6xH (+# ipv_s6xF sc1_sajS)
            of wild_a5mq { __DEFAULT ->
            $s$wfoldlM'_loop_sajT
              (+#
                 sc_sajR
                 (+#
                    (+#
                       (+#
                          (+#
                             (+#
                                (+#
                                   (+#
                                      (+#
                                         (+#
                                            (+#
                                               (+#
                                                  (+#
                                                     (+#
                                                        (+#
                                                           (+#
                                                              (+#
                                                                 (+#
                                                                    (+#
                                                                       (+#
                                                                          (+#
                                                                             (+#
                                                                                (+#
                                                                                   (+#
                                                                                      (+#
                                                                                         (+#
                                                                                            (+#
                                                                                               (+#
                                                                                                  (+#
                                                                                                     (+#
                                                                                                        (+#
                                                                                                           (+#
                                                                                                              (+#
                                                                                                                 wild_a5mq
                                                                                                                 1)
                                                                                                              1)
                                                                                                           1)
                                                                                                        1)
                                                                                                     1)
                                                                                                  1)
                                                                                               1)
                                                                                            1)
                                                                                         1)
                                                                                      1)
                                                                                   1)
                                                                                1)
                                                                             1)
                                                                          1)
                                                                       1)
                                                                    1)
                                                                 1)
                                                              1)
                                                           1)
                                                        1)
                                                     1)
                                                  1)
                                               1)
                                            1)
                                         1)
                                      1)
                                   1)
                                1)
                             1)
                          1)
                       1)
                    1))
              (+# sc1_sajS 1)
            };
          True -> sc_sajR
        }; } in
  case $s$wfoldlM'_loop_sajT 0 0 of ww_s9Rr { __DEFAULT ->
  case $wshowSignedInt 0 ww_s9Rr ([])
  of _ { (# ww5_a5iD, ww6_a5iE #) ->
  : ww5_a5iD ww6_a5iE
  }
  }
  }

Well, yes. But GHC isn't smart enough to put (+1) . (+1) to (+2) and so on. Is it actually faster?

$ stack ghc --package vector -- -O2 SO.hs && SO.exe +RTS -s
  26,400,052,464 bytes allocated in the heap                                             
           9,736 bytes copied during GC                                                  
     800,026,736 bytes maximum residency (2 sample(s))                                   
          61,328 bytes maximum slop                                                      
            1527 MB total memory in use (0 MB lost due to fragmentation)                 

                                     Tot time (elapsed)  Avg pause  Max pause            
  Gen  0        32 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s            
  Gen  1         2 colls,     0 par    0.000s   0.089s     0.0446s    0.0890s            

  INIT    time    0.000s  (  0.000s elapsed)                                             
  MUT     time    4.453s  (  4.616s elapsed)                                             
  GC      time    0.000s  (  0.090s elapsed)                                             
  EXIT    time    0.000s  (  0.089s elapsed)                                             
  Total   time    4.453s  (  4.795s elapsed)                                             

  %GC     time       0.0%  (1.9% elapsed)                                                

  Alloc rate    5,928,432,834 bytes per MUT second                                       

  Productivity 100.0% of total user, 92.9% of total elapsed                              

4 seconds for your original program. And for the inlined one?

$ stack ghc --package vector -- -O2 SOFixed.hs && SOFixed.exe +RTS -s
3200000000
     800,048,112 bytes allocated in the heap
           4,352 bytes copied during GC
          42,664 bytes maximum residency (1 sample(s))
          18,776 bytes maximum slop
             764 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0         1 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s
  Gen  1         1 colls,     0 par    0.000s   0.045s     0.0452s    0.0452s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.188s  (  0.224s elapsed)
  GC      time    0.000s  (  0.045s elapsed)
  EXIT    time    0.000s  (  0.045s elapsed)
  Total   time    0.188s  (  0.315s elapsed)

  %GC     time       0.0%  (14.4% elapsed)

  Alloc rate    4,266,923,264 bytes per MUT second

  Productivity 100.0% of total user, 59.6% of total elapsed

0.1 seconds. Great! By the way, all the (+1) calls get optimized into a single addq $32,... down the line.

这篇关于为什么这个程序似乎不能正确融合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆