为什么要总结本地列表比总结教会恩$ C $光盘`GHC -O2`名单慢? [英] Why summing native lists is slower than summing church-encoded lists with `GHC -O2`?

查看:149
本文介绍了为什么要总结本地列表比总结教会恩$ C $光盘`GHC -O2`名单慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了测试如何教会恩codeD列表对用户defiend列表和本地列表执行,我已经prepared 3标准:

In order to test how church-encoded lists perform against user-defiend lists and native lists, I've prepared 3 benchmarks:

data List a = Cons a (List a) | Nil deriving Show
lenumTil n        = go n Nil where
    go 0 result   = result
    go n result   = go (n-1) (Cons (n-1) result)
lsum Nil          = 0
lsum (Cons h t)   = h + (lsum t)

main = print (lsum (lenumTil (100000000 :: Int)))

本地列表

main = print $ sum ([0..100000000-1] :: [Int])

教堂清单

fsum   = (\ a -> (a (+) 0))
fenumTil n cons nil = go n nil where
    go 0 result    = result
    go n result    = go (n-1) (cons (n-1) result)
main = print $ (fsum (fenumTil (100000000 :: Int)) :: Int)

该基准测试结果是意想不到的:

The benchmark results are unexpected:

-- 4999999950000000
-- real 0m22.520s
-- user 0m59.815s
-- sys  0m20.327s

本地列表

-- 4999999950000000
-- real 0m0.999s
-- user 0m1.357s
-- sys  0m0.252s

教堂列表

-- 4999999950000000
-- real 0m0.010s
-- user 0m0.002s
-- sys  0m0.003s

人们预计的是,随着大量的具体优化针对原生名单,他们将执行最好的。然而,教会列表优于它们通过一个100倍因子,并通过一个2250x因子胜过用户定义的ADT。我已经编译了 GHC -O2 所有程序。我试着更换与foldl,同样的结果。我已经尝试增加用户的输入,以确保教会列表版本不是优化以一个常数。 arkeet 指出,在#haskell,通过检查核心,原生版本有一个中间的列表,但是为什么呢?强制分配一个额外的逆转,所有3个执行大致相同。

One would expect that, with the huge amount of specific optimizations targeted to native lists, they would perform the best. Yet, church lists outperform them by a 100x factor, and outperform user-defined ADTs by a 2250x factor. I've compiled all programs with GHC -O2. I've tried replacing sum by foldl', same result. I've attempted adding user-inputs to make sure the church-list version wasn't optimized to a constant. arkeet pointed out on #haskell that, by inspecting Core, the native version has an intermediate lists, but why? Forcing allocation with an additional reverse, all 3 perform roughly the same.

推荐答案

GHC 7.10有通话元数的分析,这可以让我们定义与foldl foldr相似,从而让左倍,其中,参与融合。 GHC 7.8还定义了与foldl ,但它不能融合的名单了。因此GHC 7.10进行优化,并以相同的教堂版本。

GHC 7.10 has call arity analysis, which lets us define foldl in terms of foldr and thus let left folds, including sum, participate in fusion. GHC 7.8 also defines sum with foldl but it can't fuse the lists away. Thus GHC 7.10 performs optimally and identically to the Church version.

教会的版本是孩子们的游戏,以优化在任何GHC版本。我们只需要内联(+) 0 fenumTil ,然后我们有一个公然尾递归这可以很容易地装箱,然后变成一个循环由code发电机。

The Church version is child's play to optimize in either GHC versions. We just have to inline (+) and 0 into fenumTil, and then we have a patently tail-recursive go which can be readily unboxed and then turned into a loop by the code generator.

的用户定义的版本是不尾递归,它工作在线性空间中,其中沉船当然性能,。

The user-defined version is not tail-recursive and it works in linear space, which wrecks performance, of course.

这篇关于为什么要总结本地列表比总结教会恩$ C $光盘`GHC -O2`名单慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆