为什么要总结本地列表比总结教会恩$ C $光盘`GHC -O2`名单慢? [英] Why summing native lists is slower than summing church-encoded lists with `GHC -O2`?
问题描述
为了测试如何教会恩codeD列表对用户defiend列表和本地列表执行,我已经prepared 3标准:
In order to test how church-encoded lists perform against user-defiend lists and native lists, I've prepared 3 benchmarks:
data List a = Cons a (List a) | Nil deriving Show
lenumTil n = go n Nil where
go 0 result = result
go n result = go (n-1) (Cons (n-1) result)
lsum Nil = 0
lsum (Cons h t) = h + (lsum t)
main = print (lsum (lenumTil (100000000 :: Int)))
本地列表
main = print $ sum ([0..100000000-1] :: [Int])
教堂清单
fsum = (\ a -> (a (+) 0))
fenumTil n cons nil = go n nil where
go 0 result = result
go n result = go (n-1) (cons (n-1) result)
main = print $ (fsum (fenumTil (100000000 :: Int)) :: Int)
该基准测试结果是意想不到的:
The benchmark results are unexpected:
-- 4999999950000000
-- real 0m22.520s
-- user 0m59.815s
-- sys 0m20.327s
本地列表
-- 4999999950000000
-- real 0m0.999s
-- user 0m1.357s
-- sys 0m0.252s
教堂列表
-- 4999999950000000
-- real 0m0.010s
-- user 0m0.002s
-- sys 0m0.003s
人们预计的是,随着大量的具体优化针对原生名单,他们将执行最好的。然而,教会列表优于它们通过一个100倍因子,并通过一个2250x因子胜过用户定义的ADT。我已经编译了 GHC -O2
所有程序。我试着更换之
按与foldl
,同样的结果。我已经尝试增加用户的输入,以确保教会列表版本不是优化以一个常数。 arkeet
指出,在#haskell,通过检查核心,原生版本有一个中间的列表,但是为什么呢?强制分配一个额外的逆转
,所有3个执行大致相同。
One would expect that, with the huge amount of specific optimizations targeted to native lists, they would perform the best. Yet, church lists outperform them by a 100x factor, and outperform user-defined ADTs by a 2250x factor. I've compiled all programs with GHC -O2
. I've tried replacing sum
by foldl'
, same result. I've attempted adding user-inputs to make sure the church-list version wasn't optimized to a constant. arkeet
pointed out on #haskell that, by inspecting Core, the native version has an intermediate lists, but why? Forcing allocation with an additional reverse
, all 3 perform roughly the same.
推荐答案
GHC 7.10 has call arity analysis, which lets us define foldl
in terms of foldr
and thus let left folds, including sum
, participate in fusion. GHC 7.8 also defines sum
with foldl
but it can't fuse the lists away. Thus GHC 7.10 performs optimally and identically to the Church version.
教会的版本是孩子们的游戏,以优化在任何GHC版本。我们只需要内联(+)
和 0
到 fenumTil
,然后我们有一个公然尾递归去
这可以很容易地装箱,然后变成一个循环由code发电机。
The Church version is child's play to optimize in either GHC versions. We just have to inline (+)
and 0
into fenumTil
, and then we have a patently tail-recursive go
which can be readily unboxed and then turned into a loop by the code generator.
的用户定义的版本是不尾递归,它工作在线性空间中,其中沉船当然性能,。
The user-defined version is not tail-recursive and it works in linear space, which wrecks performance, of course.
这篇关于为什么要总结本地列表比总结教会恩$ C $光盘`GHC -O2`名单慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!