Haskell可以像Clang / GCC一样优化函数调用吗? [英] Can Haskell optimize function calls the same way Clang / GCC does?
问题描述
我想问你Haskell和C ++编译器是否可以以同样的方式优化函数调用。
请查看以下代码。在下面的例子中,Haskell的速度明显快于C ++。
我听说Haskell可以编译成LLVM,并且可以通过LLVM传递来优化。此外,我听说Haskell有一些重的优化下。
但是下面的例子应该能够以相同的性能工作。
我想问:
- 为什么我在C ++中的示例基准测试比Haskell中的基准慢? $ )。
- Why my sample benchmark in C++ is slower than the on in Haskell?
- is it possible to further optimize the codes?
C ++代码:
#include< cstdio&
#include< cstdlib>
int b(const int x){
return x + 5;
}
int c(const int x){
return b(x)+1;
}
int d(const int x){
return b(x)-1;
}
int a(const int x){
return c(x)+ d(x);
}
int main(int argc,char * argv []){
printf(Starting ... \\\
);
long int iternum = atol(argv [1]);
long long int out = 0;
for(long int i = 1; i <= iternum; i ++){
out + = a(iternum-i);
}
printf(%lld\\\
,out);
printf(Done.\\\
);使用编译的
clang ++ -O3 main.cpp
haskell代码:
import qualified Data.Vector as V
import System.Environment
b :: Int - > Int
bx = x + 5
cx = bx + 1
dx = bx - 1
ax = cx + dx
main = do
putStrLn正在开始...
args< - getArgs
let iternum = read(head args):: Int in do
putStrLn $ show $ V.foldl'(+)0 $ V.map i - > a(iternum-i))
$ V.enumFromTo 1 iternum
putStrLnDone。
编译时使用 ghc -O3 --make -fforce-recomp -fllvm ghc -test.hs
速度结果:
运行测试用例程序'cpp / a.out'
-------------------
cpp / a.out 100000000 0.0%平均时间:105.05 ms
cpp / a.out 200000000 11.11%avg time:207.49 ms
cpp / a.out 300000000 22.22%avg time:309.22 ms
cpp / a.out 400000000 33.33%avg time:411.7 ms
cpp / a.out 500000000 44.44%avg time:514.07 ms
cpp / a.out 600000000 55.56%avg time:616.7 ms
cpp / a.out 700000000 66.67%avg time:718.69 ms
cpp / a.out 800000000 77.78%avg time:821.32 ms
cpp / a.out 900000000 88.89%avg time:923.18 ms
cpp / a.out 1000000000 100.0%avg time: 1025.43 ms
运行程序'hs / main'的测试用例
-------------------
hs / main 100000000 0.0%avg time:70.97 ms(diff:34.08)
hs / main 200000000 11.11%avg time:138.95 ms(diff:68.54)
hs / main 300000000 22.22%avg time:206.3 ms(diff:102.92)
hs / main 400000000 33.33%avg time:274.31 ms(diff:137.39)
hs / main 500000000 44.44%avg time:342.34 ms(diff:171.73)
hs / main 600000000 55.56%avg time:410.65 ms(diff:206.05)
hs / main 700000000 66.67%avg time:478.25 ms(diff:240.44)
hs / main 800000000 77.78%avg time:546.39 ms(差值:274.93)
hs /主900000000 88.89%平均时间:614.12毫秒(差值:309.06)
hs /主1000000000 100.0%平均时间:682.32毫秒(差值:343.11)
EDIT
当然我们无法比较语言的速度,
但我很好奇Ghc和C ++编译器可以以同样的方式优化函数调用 b $ b
我已根据您的帮助编辑了新基准和代码的问题:)
如果你的目标是让这个运行速度和你的C ++编译器一样快,那么你
就想使用一个编译器可以使用的数据结构。
模块主要其中
导入合格的Data.Vector为V
b :: Int - > Int
bx = x + 5
cx = bx + 1
dx = bx - 1
ax = cx + dx
main = do
putStrLnStarting ...
putStrLn $ show $ V.foldl'(+)0 $ V.map a $ V.enumFromTo 1 100000000
putStrLnDone。
GHC能够完全消除循环,只是插入一个常数到
。在我的计算机上, 0.002s,当
使用与您最初指定的优化标志相同的优化标志时。
作为基于@Yuras的注释的后续,
向量
pre>
main_ $ s $ wfoldlM'_loop [Occ = LoopBreaker]
:: Int# - > Int# - > Int#
main_ $ s $ wfoldlM'_loop =
\(sc_s2hW :: Int#)(sc1_s2hX :: Int#) -
case< =#sc1_s2hX 100000000 of _ {
False - > sc_s2hW;
True - >
main_ $ s $ wfoldlM'_loop
(+#
sc_s2hW
(+#
(+#sc1_s2hX 5)1)
- #(+#sc1_s2hX 5)1)))
(+#sc1_s2hX 1)
}
b $ b
stream-fusion
$ wloop_foldl [Occ = LoopBreaker]
:: Int# ; Int# - > Int#
$ wloop_foldl =
\(ww_s1Rm :: Int#)(ww1_s1Rs :: Int#) - >
case>#ww1_s1Rs 100000000 of _ {
False - >
$ wloop_foldl
(+#
ww_s1Rm
(+#
(+#ww1_s1Rs 5)1)
ww1_s1Rs 5)1)))
(+#ww1_s1Rs 1);
True - > ww_s1Rm
}
唯一真正的区别是
终止条件。两个版本编译成紧缩尾递归循环
,可以很容易地由LLVM优化。
I want to ask you if Haskell and C++ compilers can optimize function calls the same way. Please look at following codes. In the following example Haskell is significantly faster than C++.
I have heard that Haskell can compile to LLVM and can be optimized by the LLVM passes. Additionally I have heard that Haskell has some heavy optimizations under the hood. But the following examples should be able to work with the same performance. I want to ask:
(I'm using LLVM-3.2 and GHC-7.6).
C++ code:
#include <cstdio>
#include <cstdlib>
int b(const int x){
return x+5;
}
int c(const int x){
return b(x)+1;
}
int d(const int x){
return b(x)-1;
}
int a(const int x){
return c(x) + d(x);
}
int main(int argc, char* argv[]){
printf("Starting...\n");
long int iternum = atol(argv[1]);
long long int out = 0;
for(long int i=1; i<=iternum;i++){
out += a(iternum-i);
}
printf("%lld\n",out);
printf("Done.\n");
}
compiled with clang++ -O3 main.cpp
haskell code:
module Main where
import qualified Data.Vector as V
import System.Environment
b :: Int -> Int
b x = x + 5
c x = b x + 1
d x = b x - 1
a x = c x + d x
main = do
putStrLn "Starting..."
args <- getArgs
let iternum = read (head args) :: Int in do
putStrLn $ show $ V.foldl' (+) 0 $ V.map (\i -> a (iternum-i))
$ V.enumFromTo 1 iternum
putStrLn "Done."
compiled with ghc -O3 --make -fforce-recomp -fllvm ghc-test.hs
speed results:
Running testcase for program 'cpp/a.out'
-------------------
cpp/a.out 100000000 0.0% avg time: 105.05 ms
cpp/a.out 200000000 11.11% avg time: 207.49 ms
cpp/a.out 300000000 22.22% avg time: 309.22 ms
cpp/a.out 400000000 33.33% avg time: 411.7 ms
cpp/a.out 500000000 44.44% avg time: 514.07 ms
cpp/a.out 600000000 55.56% avg time: 616.7 ms
cpp/a.out 700000000 66.67% avg time: 718.69 ms
cpp/a.out 800000000 77.78% avg time: 821.32 ms
cpp/a.out 900000000 88.89% avg time: 923.18 ms
cpp/a.out 1000000000 100.0% avg time: 1025.43 ms
Running testcase for program 'hs/main'
-------------------
hs/main 100000000 0.0% avg time: 70.97 ms (diff: 34.08)
hs/main 200000000 11.11% avg time: 138.95 ms (diff: 68.54)
hs/main 300000000 22.22% avg time: 206.3 ms (diff: 102.92)
hs/main 400000000 33.33% avg time: 274.31 ms (diff: 137.39)
hs/main 500000000 44.44% avg time: 342.34 ms (diff: 171.73)
hs/main 600000000 55.56% avg time: 410.65 ms (diff: 206.05)
hs/main 700000000 66.67% avg time: 478.25 ms (diff: 240.44)
hs/main 800000000 77.78% avg time: 546.39 ms (diff: 274.93)
hs/main 900000000 88.89% avg time: 614.12 ms (diff: 309.06)
hs/main 1000000000 100.0% avg time: 682.32 ms (diff: 343.11)
EDIT Of course we cannot compare speed of languages, but the speed of implementiations.
But I'm curious if Ghc and C++ compilers can optimize function calls the same way
I've edited the question with new benchmark and codes based on your help :)
If your goal is to get this running as quickly as your C++ compiler, then you would want to use a data structure that the compiler can have its way with.
module Main where
import qualified Data.Vector as V
b :: Int -> Int
b x = x + 5
c x = b x + 1
d x = b x - 1
a x = c x + d x
main = do
putStrLn "Starting..."
putStrLn $ show $ V.foldl' (+) 0 $ V.map a $ V.enumFromTo 1 100000000
putStrLn "Done."
GHC is able to completely eliminate the loop and just inserts a constant into the resulting assembly. On my computer, this now has a runtime of < 0.002s, when using the same optimization flags as you originally specified.
As a follow up based on the comments by @Yuras, the core produced by the vector based solution and the stream-fusion solution are functionally identical.
Vector
main_$s$wfoldlM'_loop [Occ=LoopBreaker]
:: Int# -> Int# -> Int#
main_$s$wfoldlM'_loop =
\ (sc_s2hW :: Int#) (sc1_s2hX :: Int#) ->
case <=# sc1_s2hX 100000000 of _ {
False -> sc_s2hW;
True ->
main_$s$wfoldlM'_loop
(+#
sc_s2hW
(+#
(+# (+# sc1_s2hX 5) 1)
(-# (+# sc1_s2hX 5) 1)))
(+# sc1_s2hX 1)
}
stream-fusion
$wloop_foldl [Occ=LoopBreaker]
:: Int# -> Int# -> Int#
$wloop_foldl =
\ (ww_s1Rm :: Int#) (ww1_s1Rs :: Int#) ->
case ># ww1_s1Rs 100000000 of _ {
False ->
$wloop_foldl
(+#
ww_s1Rm
(+#
(+# (+# ww1_s1Rs 5) 1)
(-# (+# ww1_s1Rs 5) 1)))
(+# ww1_s1Rs 1);
True -> ww_s1Rm
}
The only real difference is the choice of comparison operation for the termination condition. Both versions compile to tight tail recursive loops that can be easily optimized by LLVM.
这篇关于Haskell可以像Clang / GCC一样优化函数调用吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!