阿克曼在Haskell / GHC方面效率很低 [英] Ackermann very inefficient with Haskell/GHC
问题描述
我尝试计算 Ackermann(4,1)
,不同语言/编译器之间的性能差异很大。以下是我的 Core i7 3820QM,16G,Ubuntu 12.10 64bit , b
C:1.6s , gcc -O3
(使用gcc 4.7.2)
int ack(int m,int n){
if(m == 0)return n + 1;
if(n == 0)return ack(m-1,1);
return ack(m-1,ack(m,n-1));
int main(){
printf(%d \ n,ack(4,1));
返回0;
}
OCaml:3.6s , ocamlopt
(with ocaml 3.12.1)
let rec ack =功能
| 0,n - > n + 1
| m,0 - > ack(m-1,1)
| m,n - > (ack(4,1))
$ b $ ack(m-1,ack(m,n-1))
b
标准ML:5.1s mlton -codegen c -cc-opt -O3
(with mlton 20100608 )
fun ack 0 n = n + 1
| ack m 0 = ack(m-1)1
| ack m n = ack(m-1)(ack m(n-1));
print(Int.toString(ack 4 1));
球拍:11.5s 球拍
(with racket v5.3.3)
(需要racket / unsafe / ops)
$ b $(define + unsafe-fx +)
(define - unsafe-fx-)
(define(ack mn)
(cond
[ ( - n 1)] ak( - n 1)]
[(零?n)(ack( - m 1)1)]
[else(ack ))]))
(时间(ack 4 1))
Haskell:未完成,在22s后被系统杀死 ghc -O2
(with ghc 7.4.2)
$ b Haskell:1.8s
ajhc
(with ajhc 0.8 .0.4) main = print $ ack 4 1
where ack :: Int - > Int - > Int
ack 0 n = n + 1
ack m 0 = ack(m-1)1
ack mn = ack(m-1)(ack m(n-1))
Haskell版本是唯一无法正常终止的版本,因为它占用的内存太多。它冻结了我的机器并在死亡之前填充了交换空间。
我可以做些什么来改进它,而不会大量地混淆代码?
编辑:我欣赏一些渐近式智能解决方案,但他们并不完全是我所要求的。这更多的是看看编译器是否以合理有效的方式处理某些模式(堆栈,尾部调用,拆箱等),而不是计算ackermann函数。
编辑2 :正如几个回复指出的,这似乎是 GHC最近版本中的一个错误。我尝试使用 AJHC 的相同代码并获得更好的性能。
感谢你非常:)
注意:高内存使用问题是GHC RTS中的一个错误,堆栈溢出并在堆上分配新栈时,检查垃圾收集是否到期。它已经在GHC HEAD中得到了修复。
通过CPS转换<$ c,我能够获得更好的性能
数据P = P!Int!Int
main :: IO()
main = print $ ack(P 4 1)id
其中
ack :: P - > (Int→> Int)→> Int
ack(P 0 n)k = k(n + 1)
ack(P m 0)k = ack(P(m-1)1)k
ack )k = ack(P m(n-1))(\ a→ack(P(m-1)a)k)
您的原始函数会占用我机器上的所有可用内存,而此函数会在恒定空间内运行。
$ time ./Test
65533
./Test 52,47s user 0,50s system 96%cpu 54,797 total
Ocaml仍然更快,但是:
$ time ./test
65533./test 7,97s用户0,05s系统94%cpu 8,475总额
编辑:使用 JHC 进行编译时,您的原始程序速度与Ocaml版本:
$时间./hs.out
65533
./hs.out 5 ,31s user 0,03s system 96%cpu 5,515 total
编辑2:我发现的其他东西:ru使用更大的堆栈块大小( + RTS -kc1M
)使您的原始程序以固定的空间运行。尽管如此,CPS版本还是要快一些。
编辑3:我设法生成的版本几乎与Ocaml一样快一个是通过手动展开主循环。然而,只有在 测试: 编辑4 :显然,空间泄漏已在GHC HEAD中修复,所以 I try computing C: 1.6s, OCaml: 3.6s, Standard ML: 5.1s Racket: 11.5s Haskell: 1.8s The Haskell version is the only one that fails to terminate properly because it takes too much memory. It freezes my machine and fills the swap space before getting killed.
What can I do to improve it without heavily fuglifying the code? EDIT: I appreciate some of the asymptotically smarter solutions, but they are not exactly what I am asking for. This is more about seeing whether the compiler handles certain patterns in a reasonably efficient way (stack, tail calls, unboxing, etc.) than computing the ackermann function. EDIT 2: As pointed out by several responses, this seems to be a bug in recent versions of GHC. I try the same code with AJHC and get much better performance. Thank you very much :) NB: The high memory usage issue is a bug in the GHC RTS, where upon stack overflow and allocation of new stacks on the heap it was not checked whether garbage collection is due. It has been already fixed in GHC HEAD. I was able to get much better performance by CPS-converting Your original function consumes all available memory on my machine, while this one runs in constant space. Ocaml is still faster, however: Edit: When compiled with JHC, your original program is about as fast as the Ocaml version: Edit 2: Something else I've discovered: running your original program with a larger stack chunk size ( Edit 3: I managed to produce a version that runs nearly as fast as the Ocaml one by manually unrolling the main loop. However, it only works when run with Testing: Edit 4: Apparently, the space leak is fixed in GHC HEAD, so 这篇关于阿克曼在Haskell / GHC方面效率很低的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! + RTS -kc1M
(Dan Doel
{ - #LANGUAGE CPP# - }
模块Main其中
data P = P { - #UNPACK# - }!Int { - #UNPACK# - }!Int
ack0 :: Int - > (a,b)C(a)C(b)$ b(a)b
ack0 n =(n + 1)
#define C(a)a
#define CONCAT
$ b #define AckType(M)CONCAT(ack,M):: Int - > Int
AckType(1)
AckType(2)
AckType(3)
AckType(4)
#define AckDecl(M ,M1)\
CONCAT(ack,M)n = {0 - > CONCAT(ack,M1)1 \
; 1 - > CONCAT(ack,M1)(CONCAT(ack,M1)1)\
; _ - > CONCAT(ack,M1)(CONCAT(ack,M)(n-1))}
AckDecl(1,0)
AckDecl(2,1)
AckDecl 3,2)
AckDecl(4,3)
ack :: P - > (Int→> Int)→> Int
ack(P m n)k =
0的情况m - > k(ack0 n)
1 - > k(ack1 n)
2 - > k(ack2 n)
3 - > k(ack3 n)
4 - > k(ack4 n)
_ - >案例n的
0 - > ack(P(m-1)1)k
1 - > ack(P(m-1)1)(\ a→ack(P(m-1)a)k)
- > ack(P m(n-1))(\ a→ack(P(m-1)a)k)
main :: IO()
main = print $ ack(P 4 1)id
$ time ./Test + RTS -kc1M
65533
./Test + RTS -kc1M 6,30s用户0,04s系统97%cpu 6,516总计
+ RTS -kc1M
将不再需要。Ackermann(4,1)
and there's a big difference in performance between different languages/compilers. Below are results on my Core i7 3820QM, 16G, Ubuntu 12.10 64bit, gcc -O3
(with gcc 4.7.2)int ack(int m, int n) {
if (m == 0) return n+1;
if (n == 0) return ack(m-1, 1);
return ack(m-1, ack(m, n-1));
}
int main() {
printf("%d\n", ack(4,1));
return 0;
}
ocamlopt
(with ocaml 3.12.1)let rec ack = function
| 0,n -> n+1
| m,0 -> ack (m-1, 1)
| m,n -> ack (m-1, ack (m, n-1))
in print_int (ack (4, 1))
mlton -codegen c -cc-opt -O3
(with mlton 20100608)fun ack 0 n = n+1
| ack m 0 = ack (m-1) 1
| ack m n = ack (m-1) (ack m (n-1));
print (Int.toString (ack 4 1));
racket
(with racket v5.3.3)(require racket/unsafe/ops)
(define + unsafe-fx+)
(define - unsafe-fx-)
(define (ack m n)
(cond
[(zero? m) (+ n 1)]
[(zero? n) (ack (- m 1) 1)]
[else (ack (- m 1) (ack m (- n 1)))]))
(time (ack 4 1))
Haskell: unfinished, killed by system after 22s ghc -O2
(with ghc 7.4.2)ajhc
(with ajhc 0.8.0.4)main = print $ ack 4 1
where ack :: Int -> Int -> Int
ack 0 n = n+1
ack m 0 = ack (m-1) 1
ack m n = ack (m-1) (ack m (n-1))
ack
:module Main where
data P = P !Int !Int
main :: IO ()
main = print $ ack (P 4 1) id
where
ack :: P -> (Int -> Int) -> Int
ack (P 0 n) k = k (n + 1)
ack (P m 0) k = ack (P (m-1) 1) k
ack (P m n) k = ack (P m (n-1)) (\a -> ack (P (m-1) a) k)
$ time ./Test
65533
./Test 52,47s user 0,50s system 96% cpu 54,797 total
$ time ./test
65533./test 7,97s user 0,05s system 94% cpu 8,475 total
$ time ./hs.out
65533
./hs.out 5,31s user 0,03s system 96% cpu 5,515 total
+RTS -kc1M
) makes it run in constant space. The CPS version is still a bit faster, though.+RTS -kc1M
(Dan Doel has filed a bug about this behaviour):{-# LANGUAGE CPP #-}
module Main where
data P = P {-# UNPACK #-} !Int {-# UNPACK #-} !Int
ack0 :: Int -> Int
ack0 n =(n+1)
#define C(a) a
#define CONCAT(a,b) C(a)C(b)
#define AckType(M) CONCAT(ack,M) :: Int -> Int
AckType(1)
AckType(2)
AckType(3)
AckType(4)
#define AckDecl(M,M1) \
CONCAT(ack,M) n = case n of { 0 -> CONCAT(ack,M1) 1 \
; 1 -> CONCAT(ack,M1) (CONCAT(ack,M1) 1) \
; _ -> CONCAT(ack,M1) (CONCAT(ack,M) (n-1)) }
AckDecl(1,0)
AckDecl(2,1)
AckDecl(3,2)
AckDecl(4,3)
ack :: P -> (Int -> Int) -> Int
ack (P m n) k = case m of
0 -> k (ack0 n)
1 -> k (ack1 n)
2 -> k (ack2 n)
3 -> k (ack3 n)
4 -> k (ack4 n)
_ -> case n of
0 -> ack (P (m-1) 1) k
1 -> ack (P (m-1) 1) (\a -> ack (P (m-1) a) k)
_ -> ack (P m (n-1)) (\a -> ack (P (m-1) a) k)
main :: IO ()
main = print $ ack (P 4 1) id
$ time ./Test +RTS -kc1M
65533
./Test +RTS -kc1M 6,30s user 0,04s system 97% cpu 6,516 total
+RTS -kc1M
won't be required in the future.