Haskell可以像Clang / GCC一样优化函数调用吗? [英] Can Haskell optimize function calls the same way Clang / GCC does?

查看:271
本文介绍了Haskell可以像Clang / GCC一样优化函数调用吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想问你Haskell和C ++编译器是否可以以同样的方式优化函数调用。
请查看以下代码。在下面的例子中,Haskell的速度明显快于C ++。



我听说Haskell可以编译成LLVM,并且可以通过LLVM传递来优化。此外,我听说Haskell有一些重的优化下。
但是下面的例子应该能够以相同的性能工作。
我想问:


  1. 为什么我在C ++中的示例基准测试比Haskell中的基准慢?
  2. $ )。



    C ++代码:

      #include< cstdio& 
    #include< cstdlib>

    int b(const int x){
    return x + 5;
    }

    int c(const int x){
    return b(x)+1;
    }

    int d(const int x){
    return b(x)-1;
    }

    int a(const int x){
    return c(x)+ d(x);
    }

    int main(int argc,char * argv []){
    printf(Starting ... \\\
    );
    long int iternum = atol(argv [1]);
    long long int out = 0;
    for(long int i = 1; i <= iternum; i ++){
    out + = a(iternum-i);
    }
    printf(%lld\\\
    ,out);
    printf(Done.\\\
    );使用编译的

    clang ++ -O3 main.cpp



    haskell代码:

      
    import qualified Data.Vector as V
    import System.Environment
    b :: Int - > Int
    bx = x + 5
    cx = bx + 1
    dx = bx - 1
    ax = cx + dx
    main = do
    putStrLn正在开始...
    args< - getArgs
    let iternum = read(head args):: Int in do
    putStrLn $ show $ V.foldl'(+)0 $ V.map i - > a(iternum-i))
    $ V.enumFromTo 1 iternum
    putStrLnDone。

    编译时使用 ghc -O3 --make -fforce-recomp -fllvm ghc -test.hs



    速度结果:



     运行测试用例程序'cpp / a.out'
    -------------------
    cpp / a.out 100000000 0.0%平均时间:105.05 ms
    cpp / a.out 200000000 11.11%avg time:207.49 ms
    cpp / a.out 300000000 22.22%avg time:309.22 ms
    cpp / a.out 400000000 33.33%avg time:411.7 ms
    cpp / a.out 500000000 44.44%avg time:514.07 ms
    cpp / a.out 600000000 55.56%avg time:616.7 ms
    cpp / a.out 700000000 66.67%avg time:718.69 ms
    cpp / a.out 800000000 77.78%avg time:821.32 ms
    cpp / a.out 900000000 88.89%avg time:923.18 ms
    cpp / a.out 1000000000 100.0%avg time: 1025.43 ms


    运行程序'hs / main'的测试用例
    -------------------
    hs / main 100000000 0.0%avg time:70.97 ms(diff:34.08)
    hs / main 200000000 11.11%avg time:138.95 ms(diff:68.54)
    hs / main 300000000 22.22%avg time:206.3 ms(diff:102.92)
    hs / main 400000000 33.33%avg time:274.31 ms(diff:137.39)
    hs / main 500000000 44.44%avg time:342.34 ms(diff:171.73)
    hs / main 600000000 55.56%avg time:410.65 ms(diff:206.05)
    hs / main 700000000 66.67%avg time:478.25 ms(diff:240.44)
    hs / main 800000000 77.78%avg time:546.39 ms(差值:274.93)
    hs /主900000000 88.89%平均时间:614.12毫秒(差值:309.06)
    hs /主1000000000 100.0%平均时间:682.32毫秒(差值:343.11)

    EDIT
    当然我们无法比较语言的速度,



    但我很好奇Ghc和C ++编译器可以以同样的方式优化函数调用 b $ b

    我已根据您的帮助编辑了新基准和代码的问题:)

    解决方案

    如果你的目标是让这个运行速度和你的C ++编译器一样快,那么你
    就想使用一个编译器可以使用的数据结构。

     模块主要其中

    导入合格的Data.Vector为V

    b :: Int - > Int
    bx = x + 5
    cx = bx + 1
    dx = bx - 1

    ax = cx + dx

    main = do
    putStrLnStarting ...
    putStrLn $ show $ V.foldl'(+)0 $ V.map a $ V.enumFromTo 1 100000000
    putStrLnDone。

    GHC能够完全消除循环,只是插入一个常数到
    。在我的计算机上, 0.002s,当
    使用与您最初指定的优化标志相同的优化标志时。



    作为基于@Yuras的注释的后续,



    向量



    pre> main_ $ s $ wfoldlM'_loop [Occ = LoopBreaker]
    :: Int# - > Int# - > Int#

    main_ $ s $ wfoldlM'_loop =
    \(sc_s2hW :: Int#)(sc1_s2hX :: Int#) -
    case< =#sc1_s2hX 100000000 of _ {
    False - > sc_s2hW;
    True - >
    main_ $ s $ wfoldlM'_loop
    (+#
    sc_s2hW
    (+#
    (+#sc1_s2hX 5)1)
    - #(+#sc1_s2hX 5)1)))
    (+#sc1_s2hX 1)
    }


    b $ b

    stream-fusion



      $ wloop_foldl [Occ = LoopBreaker] 
    :: Int# ; Int# - > Int#

    $ wloop_foldl =
    \(ww_s1Rm :: Int#)(ww1_s1Rs :: Int#) - >
    case>#ww1_s1Rs 100000000 of _ {
    False - >
    $ wloop_foldl
    (+#
    ww_s1Rm
    (+#
    (+#ww1_s1Rs 5)1)
    ww1_s1Rs 5)1)))
    (+#ww1_s1Rs 1);
    True - > ww_s1Rm
    }

    唯一真正的区别是
    终止条件。两个版本编译成紧缩尾递归循环
    ,可以很容易地由LLVM优化。


    I want to ask you if Haskell and C++ compilers can optimize function calls the same way. Please look at following codes. In the following example Haskell is significantly faster than C++.

    I have heard that Haskell can compile to LLVM and can be optimized by the LLVM passes. Additionally I have heard that Haskell has some heavy optimizations under the hood. But the following examples should be able to work with the same performance. I want to ask:

    1. Why my sample benchmark in C++ is slower than the on in Haskell?
    2. is it possible to further optimize the codes?

    (I'm using LLVM-3.2 and GHC-7.6).

    C++ code:

    #include <cstdio>
    #include <cstdlib>
    
    int b(const int x){
        return x+5;
    }
    
    int c(const int x){
        return b(x)+1;
    }
    
    int d(const int x){
        return b(x)-1;
    }
    
    int a(const int x){
        return c(x) + d(x);
    }
    
    int main(int argc, char* argv[]){
        printf("Starting...\n");
        long int iternum = atol(argv[1]);
        long long int out = 0;
        for(long int i=1; i<=iternum;i++){
            out += a(iternum-i);
        }
        printf("%lld\n",out);
        printf("Done.\n");
    }
    

    compiled with clang++ -O3 main.cpp

    haskell code:

    module Main where
    import qualified Data.Vector as V
    import System.Environment
    b :: Int -> Int
    b x = x + 5
    c x = b x + 1
    d x = b x - 1
    a x = c x + d x
    main = do
       putStrLn "Starting..."
       args <- getArgs
       let iternum = read (head args) :: Int in do
          putStrLn $ show $ V.foldl' (+) 0 $ V.map (\i -> a (iternum-i))
             $ V.enumFromTo 1 iternum
          putStrLn "Done."
    

    compiled with ghc -O3 --make -fforce-recomp -fllvm ghc-test.hs

    speed results:

    Running testcase for program 'cpp/a.out'
    -------------------
    cpp/a.out 100000000      0.0%   avg time: 105.05 ms 
    cpp/a.out 200000000      11.11% avg time: 207.49 ms 
    cpp/a.out 300000000      22.22% avg time: 309.22 ms 
    cpp/a.out 400000000      33.33% avg time: 411.7 ms 
    cpp/a.out 500000000      44.44% avg time: 514.07 ms 
    cpp/a.out 600000000      55.56% avg time: 616.7 ms 
    cpp/a.out 700000000      66.67% avg time: 718.69 ms
    cpp/a.out 800000000      77.78% avg time: 821.32 ms 
    cpp/a.out 900000000      88.89% avg time: 923.18 ms 
    cpp/a.out 1000000000     100.0% avg time: 1025.43 ms
    
    
    Running testcase for program 'hs/main'
    -------------------
    hs/main 100000000    0.0%   avg time: 70.97 ms (diff: 34.08)
    hs/main 200000000    11.11% avg time: 138.95 ms (diff: 68.54)
    hs/main 300000000    22.22% avg time: 206.3 ms (diff: 102.92)
    hs/main 400000000    33.33% avg time: 274.31 ms (diff: 137.39)
    hs/main 500000000    44.44% avg time: 342.34 ms (diff: 171.73)
    hs/main 600000000    55.56% avg time: 410.65 ms (diff: 206.05)
    hs/main 700000000    66.67% avg time: 478.25 ms (diff: 240.44)
    hs/main 800000000    77.78% avg time: 546.39 ms (diff: 274.93)
    hs/main 900000000    88.89% avg time: 614.12 ms (diff: 309.06)
    hs/main 1000000000   100.0% avg time: 682.32 ms (diff: 343.11)
    

    EDIT Of course we cannot compare speed of languages, but the speed of implementiations.

    But I'm curious if Ghc and C++ compilers can optimize function calls the same way

    I've edited the question with new benchmark and codes based on your help :)

    解决方案

    If your goal is to get this running as quickly as your C++ compiler, then you would want to use a data structure that the compiler can have its way with.

    module Main where
    
    import qualified Data.Vector as V
    
    b :: Int -> Int
    b x = x + 5
    c x = b x + 1
    d x = b x - 1
    
    a x = c x + d x
    
    main = do
        putStrLn "Starting..."
        putStrLn $ show $ V.foldl' (+) 0 $ V.map a $ V.enumFromTo 1 100000000
        putStrLn "Done."
    

    GHC is able to completely eliminate the loop and just inserts a constant into the resulting assembly. On my computer, this now has a runtime of < 0.002s, when using the same optimization flags as you originally specified.

    As a follow up based on the comments by @Yuras, the core produced by the vector based solution and the stream-fusion solution are functionally identical.

    Vector

    main_$s$wfoldlM'_loop [Occ=LoopBreaker]
      :: Int# -> Int# -> Int#
    
    main_$s$wfoldlM'_loop =
      \ (sc_s2hW :: Int#) (sc1_s2hX :: Int#) ->
        case <=# sc1_s2hX 100000000 of _ {
          False -> sc_s2hW;
          True ->
            main_$s$wfoldlM'_loop
              (+#
                 sc_s2hW
                 (+#
                    (+# (+# sc1_s2hX 5) 1)
                    (-# (+# sc1_s2hX 5) 1)))
              (+# sc1_s2hX 1)
        }
    

    stream-fusion

    $wloop_foldl [Occ=LoopBreaker]
      :: Int# -> Int# -> Int#
    
    $wloop_foldl =
      \ (ww_s1Rm :: Int#) (ww1_s1Rs :: Int#) ->
        case ># ww1_s1Rs 100000000 of _ {
          False ->
            $wloop_foldl
              (+#
                 ww_s1Rm
                 (+#
                    (+# (+# ww1_s1Rs 5) 1)
                    (-# (+# ww1_s1Rs 5) 1)))
              (+# ww1_s1Rs 1);
          True -> ww_s1Rm
        }
    

    The only real difference is the choice of comparison operation for the termination condition. Both versions compile to tight tail recursive loops that can be easily optimized by LLVM.

    这篇关于Haskell可以像Clang / GCC一样优化函数调用吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆