阿克曼在Haskell / GHC方面效率很低 [英] Ackermann very inefficient with Haskell/GHC

查看:199
本文介绍了阿克曼在Haskell / GHC方面效率很低的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试计算 Ackermann(4,1) ,不同语言/编译器之间的性能差异很大。以下是我的 Core i7 3820QM,16G,Ubuntu 12.10 64bit b
C:1.6s gcc -O3 (使用gcc 4.7.2)

  int ack(int m,int n){
if(m == 0)return n + 1;
if(n == 0)return ack(m-1,1);
return ack(m-1,ack(m,n-1));


int main(){
printf(%d \ n,ack(4,1));
返回0;
}

OCaml:3.6s ocamlopt (with ocaml 3.12.1)

  let rec ack =功能
| 0,n - > n + 1
| m,0 - > ack(m-1,1)
| m,n - > (ack(4,1))

$ b $ ack(m-1,ack(m,n-1))
b
标准ML:5.1s mlton -codegen c -cc-opt -O3 (with mlton 20100608 )

  fun ack 0 n = n + 1 
| ack m 0 = ack(m-1)1
| ack m n = ack(m-1)(ack m(n-1));
print(Int.toString(ack 4 1));

球拍:11.5s 球拍 (with racket v5.3.3)

 (需要racket / unsafe / ops) 
$ b $(define + unsafe-fx +)
(define - unsafe-fx-)
(define(ack mn)
(cond
[ ( - n 1)] ak( - n 1)]
[(零?n)(ack( - m 1)1)]
[else(ack ))]))

(时间(ack 4 1))

Haskell:未完成,在22s后被系统杀死 ghc -O2 (with ghc 7.4.2)


$ b Haskell:1.8s ajhc (with ajhc 0.8 .0.4)

  main = print $ ack 4 1 
where ack :: Int - > Int - > Int
ack 0 n = n + 1
ack m 0 = ack(m-1)1
ack mn = ack(m-1)(ack m(n-1))

Haskell版本是唯一无法正常终止的版本,因为它占用的内存太多。它冻结了我的机器并在死亡之前填充了交换空间。
我可以做些什么来改进它,而不会大量地混淆代码?



编辑:我欣赏一些渐近式智能解决方案,但他们并不完全是我所要求的。这更多的是看看编译器是否以合理有效的方式处理某些模式(堆栈,尾部调用,拆箱等),而不是计算ackermann函数。

编辑2 :正如几个回复指出的,这似乎是 GHC最近版本中的一个错误。我尝试使用 AJHC 的相同代码并获得更好的性能。



感谢你非常:)

解决方案

注意:高内存使用问题是GHC RTS中的一个错误,堆栈溢出并在堆上分配新栈时,检查垃圾收集是否到期。它已经在GHC HEAD中得到了修复。






通过CPS转换<$ c,我能够获得更好的性能






数据P = P!Int!Int

main :: IO()
main = print $ ack(P 4 1)id
其中
ack :: P - > (Int→> Int)→> Int
ack(P 0 n)k = k(n + 1)
ack(P m 0)k = ack(P(m-1)1)k
ack )k = ack(P m(n-1))(\ a→ack(P(m-1)a)k)

您的原始函数会占用我机器上的所有可用内存,而此函数会在恒定空间内运行。

  $ time ./Test 
65533
./Test 52,47s user 0,50s system 96%cpu 54,797 total

Ocaml仍然更快,但是:

  $ time ./test 
65533./test 7,97s用户0,05s系统94%cpu 8,475总额

编辑:使用 JHC 进行编译时,您的原始程序速度与Ocaml版本:

  $时间./hs.out 
65533
./hs.out 5 ,31s user 0,03s system 96%cpu 5,515 total

编辑2:我发现的其他东西:ru使用更大的堆栈块大小( + RTS -kc1M )使您的原始程序以固定的空间运行。尽管如此,CPS版本还是要快一些。



编辑3:我设法生成的版本几乎与Ocaml一样快一个是通过手动展开主循环。然而,只有在 + RTS -kc1M (Dan Doel

  { - #LANGUAGE CPP# - } 
模块Main其中

data P = P { - #UNPACK# - }!Int { - #UNPACK# - }!Int

ack0 :: Int - > (a,b)C(a)C(b)$ b(a)b
ack0 n =(n + 1)

#define C(a)a
#define CONCAT
$ b #define AckType(M)CONCAT(ack,M):: Int - > Int

AckType(1)
AckType(2)
AckType(3)
AckType(4)

#define AckDecl(M ,M1)\
CONCAT(ack,M)n = {0 - > CONCAT(ack,M1)1 \
; 1 - > CONCAT(ack,M1)(CONCAT(ack,M1)1)\
; _ - > CONCAT(ack,M1)(CONCAT(ack,M)(n-1))}

AckDecl(1,0)
AckDecl(2,1)
AckDecl 3,2)
AckDecl(4,3)

ack :: P - > (Int→> Int)→> Int
ack(P m n)k =
0的情况m - > k(ack0 n)
1 - > k(ack1 n)
2 - > k(ack2 n)
3 - > k(ack3 n)
4 - > k(ack4 n)
_ - >案例n的
0 - > ack(P(m-1)1)k
1 - > ack(P(m-1)1)(\ a→ack(P(m-1)a)k)
- > ack(P m(n-1))(\ a→ack(P(m-1)a)k)

main :: IO()
main = print $ ack(P 4 1)id

测试:

  $ time ./Test + RTS -kc1M 
65533
./Test + RTS -kc1M 6,30s用户0,04s系统97%cpu 6,516总计

编辑4 :显然,空间泄漏已在GHC HEAD中修复,所以 + RTS -kc1M 将不再需要。


I try computing Ackermann(4,1) and there's a big difference in performance between different languages/compilers. Below are results on my Core i7 3820QM, 16G, Ubuntu 12.10 64bit,

C: 1.6s, gcc -O3 (with gcc 4.7.2)

int ack(int m, int n) {
  if (m == 0) return n+1;
  if (n == 0) return ack(m-1, 1);
  return ack(m-1, ack(m, n-1));
}

int main() {
  printf("%d\n", ack(4,1));
  return 0;
}

OCaml: 3.6s, ocamlopt (with ocaml 3.12.1)

let rec ack = function
  | 0,n -> n+1
  | m,0 -> ack (m-1, 1)
  | m,n -> ack (m-1, ack (m, n-1))
in print_int (ack (4, 1))

Standard ML: 5.1s mlton -codegen c -cc-opt -O3 (with mlton 20100608)

fun ack 0 n = n+1
  | ack m 0 = ack (m-1) 1
  | ack m n = ack (m-1) (ack m (n-1));
print (Int.toString (ack 4 1));

Racket: 11.5s racket (with racket v5.3.3)

(require racket/unsafe/ops)

(define + unsafe-fx+)
(define - unsafe-fx-)
(define (ack m n)
  (cond
    [(zero? m) (+ n 1)]
    [(zero? n) (ack (- m 1) 1)]
    [else (ack (- m 1) (ack m (- n 1)))]))

(time (ack 4 1))

Haskell: unfinished, killed by system after 22s ghc -O2 (with ghc 7.4.2)

Haskell: 1.8s ajhc (with ajhc 0.8.0.4)

main = print $ ack 4 1
  where ack :: Int -> Int -> Int
        ack 0 n = n+1
        ack m 0 = ack (m-1) 1
        ack m n = ack (m-1) (ack m (n-1))

The Haskell version is the only one that fails to terminate properly because it takes too much memory. It freezes my machine and fills the swap space before getting killed. What can I do to improve it without heavily fuglifying the code?

EDIT: I appreciate some of the asymptotically smarter solutions, but they are not exactly what I am asking for. This is more about seeing whether the compiler handles certain patterns in a reasonably efficient way (stack, tail calls, unboxing, etc.) than computing the ackermann function.

EDIT 2: As pointed out by several responses, this seems to be a bug in recent versions of GHC. I try the same code with AJHC and get much better performance.

Thank you very much :)

解决方案

NB: The high memory usage issue is a bug in the GHC RTS, where upon stack overflow and allocation of new stacks on the heap it was not checked whether garbage collection is due. It has been already fixed in GHC HEAD.


I was able to get much better performance by CPS-converting ack:

module Main where

data P = P !Int !Int

main :: IO ()
main = print $ ack (P 4 1) id
  where
    ack :: P -> (Int -> Int) -> Int
    ack (P 0 n) k = k (n + 1)
    ack (P m 0) k = ack (P (m-1) 1) k
    ack (P m n) k = ack (P m (n-1)) (\a -> ack (P (m-1) a) k)

Your original function consumes all available memory on my machine, while this one runs in constant space.

$ time ./Test
65533
./Test  52,47s user 0,50s system 96% cpu 54,797 total

Ocaml is still faster, however:

$ time ./test
65533./test  7,97s user 0,05s system 94% cpu 8,475 total

Edit: When compiled with JHC, your original program is about as fast as the Ocaml version:

$ time ./hs.out 
65533
./hs.out  5,31s user 0,03s system 96% cpu 5,515 total

Edit 2: Something else I've discovered: running your original program with a larger stack chunk size (+RTS -kc1M) makes it run in constant space. The CPS version is still a bit faster, though.

Edit 3: I managed to produce a version that runs nearly as fast as the Ocaml one by manually unrolling the main loop. However, it only works when run with +RTS -kc1M (Dan Doel has filed a bug about this behaviour):

{-# LANGUAGE CPP #-}
module Main where

data P = P {-# UNPACK #-} !Int {-# UNPACK #-} !Int

ack0 :: Int -> Int
ack0 n =(n+1)

#define C(a) a
#define CONCAT(a,b) C(a)C(b)

#define AckType(M) CONCAT(ack,M) :: Int -> Int

AckType(1)
AckType(2)
AckType(3)
AckType(4)

#define AckDecl(M,M1) \
CONCAT(ack,M) n = case n of { 0 -> CONCAT(ack,M1) 1 \
; 1 ->  CONCAT(ack,M1) (CONCAT(ack,M1) 1) \
; _ ->  CONCAT(ack,M1) (CONCAT(ack,M) (n-1)) }

AckDecl(1,0)
AckDecl(2,1)
AckDecl(3,2)
AckDecl(4,3)

ack :: P -> (Int -> Int) -> Int
ack (P m n) k = case m of
  0 -> k (ack0 n)
  1 -> k (ack1 n)
  2 -> k (ack2 n)
  3 -> k (ack3 n)
  4 -> k (ack4 n)
  _ -> case n of
    0 -> ack (P (m-1) 1) k
    1 -> ack (P (m-1) 1) (\a -> ack (P (m-1) a) k)
    _ -> ack (P m (n-1)) (\a -> ack (P (m-1) a) k)

main :: IO ()
main = print $ ack (P 4 1) id

Testing:

$ time ./Test +RTS -kc1M
65533
./Test +RTS -kc1M  6,30s user 0,04s system 97% cpu 6,516 total

Edit 4: Apparently, the space leak is fixed in GHC HEAD, so +RTS -kc1M won't be required in the future.

这篇关于阿克曼在Haskell / GHC方面效率很低的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆