与项目欧拉速度对比:C VS的Python VS二郎VS哈斯克尔 [英] Speed comparison with Project Euler: C vs Python vs Erlang vs Haskell

查看:118
本文介绍了与项目欧拉速度对比:C VS的Python VS二郎VS哈斯克尔的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已问题#12 从的项目欧拉的作为编程练习和我的(肯定不是最优的)实现在C,Python和Erlang和Haskell的比较。为了得到一些更高的执行时间,我搜索了超过1000除数,而不是500在原来的问题说第一个三角形数量。

结果如下:

C:

 洛伦佐@恩佐:〜/ $二郎的gcc -o -lm euler12.bin euler12.c
洛伦佐@恩佐:〜/ $二郎时间./euler12.bin
842161320真正0m11.074s
用户0m11.070s
SYS 0m0.000s

蟒蛇:

 洛伦佐@恩佐:〜/ $二郎时间./euler12.py
842161320真正1m16.632s
用户1m16.370s
SYS 0m0.250s

蟒蛇与pypy:

 洛伦佐@恩佐:〜/下载/ py​​py-C-JIT-43780-b590cf6de419-LINUX64 /斌$时间./pypy /home/lorenzo/erlang/euler12.py
842161320真正0m13.082s
用户0m13.050s
SYS 0m0.020s

二郎:

 洛伦佐@恩佐:〜/ $二郎erlc euler12.erl
洛伦佐@恩佐:〜/ $二郎ERL时间-s euler12解决
二郎R13B03(专家评审组-5.7.4)[来源] [64位] [SMP:4:4] [RQ:4] [异步线程:0] [HIPE] [内核投票:假]ESHELL V5.7.4(含中止^ G)
1> 842161320真正0m48.259s
用户0m48.070s
SYS 0m0.020s

哈斯克尔:

 洛伦佐@恩佐:〜/ $二郎GHC euler12.hs -o euler12.hsx
[1 1]编译主(euler12.hs,euler12.o)
链接euler12.hsx ...
洛伦佐@恩佐:〜/ $二郎时间./euler12.hsx
842161320真正2m37.326s
用户2m37.240s
SYS 0m0.080s

摘要:


  • C:100%

  • 蟒蛇:692%(含118 pypy%)

  • 二郎:436%(135感谢RichardC%)

  • 哈斯克尔:1421%

我想,C有一个很大的优势,因为它使用长计算,而不是任意长度整数作为其他三个。此外,它并不需要先装入运行时(难道其他人呢?)。

问1:
不要二郎,Python和Haskell中失去适当的速度,使用任意长度的整数或不也只要值小于 MAXINT

问题2:
为什么Haskell的这么慢?是否有关闭刹车或者是我的执行编译器标志? (后者是完全有可能作为Haskell是一本书,用七印给我。)

问题3:
你能不能给我一些提示如何优化这些实现在不改变我决定因素的方式吗?优化以任何方式:更好,更快,更本土的语言

编辑:

问题4:
做我的功能实现许可证LCO(最后一次通话优化,a.k.a尾递归消除),从而避免增加不必要的帧到调用栈?

我真的努力实现相同的算法尽可能相似的四种语言,虽然我不得不承认,我的Haskell和Erlang的知识是非常有限的。


来源$ C ​​$ CS使用:

 的#include<&stdio.h中GT;
#包括LT&;&math.h中GT;INT factorCount(N久)
{
    双平方=开方(N);
    INT国际广场=(int)的平方;
    诠释计数=国际广场==方? -1:0;
    长候选人;
    对(候选= 1;候选LT =国际广场候选++)
        如果(0 == N%的候选人)数+ = 2;
    返回计数;
}诠释的main()
{
    长三角= 1;
    INT索引= 1;
    而(factorCount(三角形)下,1001)
    {
        指数++;
        三角形+ =指数;
    }
    的printf(%LD \\ N,三角形);
}


 #!在/ usr /斌/ env的python3.2进口数学高清factorCount(N):
    方=的Math.sqrt(N)
    国际广场= INT(方)
    数= -1,如果国际广场==广场否则为0
    有效范围内的候补(1,国际广场+ 1):
        如果不是n%的候选人:数+ = 2
    返回计数三角= 1
索引= 1
而factorCount(三角形)< 1001:
    指数+ = 1
    三角形+ =指数打印(三角形)


  -module(euler12)。
-compile(export_all)。factorCount(数字) - GT; factorCount(数字,数学:SQRT(数字),1,0)。factorCount(_,的Sqrt,候选人,计数),当候选GT;开方 - >计数;factorCount(_,的Sqrt,候选人,计数)当候选人==的Sqrt - >计数+ 1;factorCount(数字,的Sqrt,候选人,计数) - GT;
    病例数REM候选人
        0 - > factorCount(数字,的Sqrt,候选人+ 1,计数+ 2);
        _ - > factorCount(数字,的Sqrt,候选人+ 1,计数)。
    结束。nextTriangle(指数,三角) - GT;
    计数= factorCount(三角),
    如果
        COUNT> 1000 - >三角形;
        真 - > nextTriangle(索引+ 1,+三角指数+ 1)
    结束。解决() - >
    IO:格式(用〜p〜n的,[nextTriangle(1,1)]),
    停止(0)。


  factorCount数= factorCount'号国际广场1 0  - (fromEnum $平方米== fromIntegral国际广场)
    其中方=开方$ fromIntegral号
          国际广场=楼广场factorCount'号开方候选人数
    | fromIntegral候选GT;开方=计数
    |一些`mod`候选人== 0 = factorCount'号的sqrt(候选+ 1)(数+ 2)
    |否则= factorCount'号的sqrt(候选+ 1)的计nextTriangle指数三角形
    | factorCount三角> 1000 =三角形
    |否则= nextTriangle(索引+1)(三角形+索引+1)主要= $打印1 nextTriangle 1


解决方案

使用 GHC 7.0.3 GCC 4.4.6 的Linux 2.6.29 的x86_64的酷睿双核(2.5GHz的)的机器上,使用编译 GHC -O2 -fllvm -fforce,recomp 为Haskell和 GCC -O3 -lm 对于C


  • 8.4秒钟内你的C例程运行(比你跑,因为可能更快 -O3

  • Haskell的解决方案36秒运行一次(由于 -O2 标志)

  • factorCount code未显式类型,缺省值为整数(感谢丹尼尔在这里纠正我的误诊!)。给使用显式类型的签名(这是标准的做法无论如何)内部和时间的变化为11.1秒

  • factorCount 你不必要名为 fromIntegral 。一个修复的结果虽然没有变化(编译器是聪明的,算你走运)。

  • 您使用 MOD ,其中 REM 更快,足够了。这改变了时间的8.5秒

  • factorCount 正在不断施加两个额外的参数是永远不会改变(数量开方)。一名工人/包装改造给我们:

$时间./so
 842161320 真正0m7.954s
 用户0m7.944s
 SYS 0m0.004s

这是正确的,7.95秒。一贯半秒比C解决方案更快。如果没有 -fllvm 标志,我仍然得到8.182秒,所以后端NCG在这种情况下做的很好过

结论:Haskell是真棒

致使code

factorCount数= factorCount'号国际广场1 0 - (fromEnum $平方米== fromIntegral国际广场)
    其中方=开方$ fromIntegral号
          国际广场=楼广场factorCount'::诠释 - > INT - > INT - > INT - > INT
factorCount'号开方candidate0 COUNT0 =去candidate0 COUNT0
  哪里
  去候选人数
    |候选GT;开方=计数
    |一些`rem`候选人== 0 =去(候选+ 1)(数+ 2)
    |否则=去(候选+ 1)的计nextTriangle指数三角形
    | factorCount三角> 1000 =三角形
    |否则= nextTriangle(索引+1)(三角形+索引+1)主要= $打印1 nextTriangle 1

编辑:所以,现在,我们已经探讨了,让解决问题。


  

问题1:你二郎神,蟒蛇和Haskell失速由于使用
  任意长度的整数或不它们只要值少
  比MAXINT?


在Haskell中,使用整数内部,但慢多少取决于执行的计算速度较慢。幸运的是(对于64位计算机)内部就足够了。对于便携性的缘故,你或许应该重写我的code使用的Int64 word64与(C不是唯一用语言)。


  

问题2:为什么Haskell的这么慢?是否有一个编译器标志
  关闭刹车或者是我的执行? (后者是相当
  作为可能Haskell是一本书,用七印给我。)


  
  

问题3:你能提供给我一些提示如何优化这些
  实现在不改变我决定因素的方式吗?
  优化以任何方式:更好,更快,更本土的语言


这就是我上面的回答。答案是


  • 0),通过 -O2 使用优化

  • 1)使用快(尤其是:拆箱-能)类型时,可能

  • 2) REM 不是 MOD (一个经常被遗忘的优化)和

  • 3)工人/包装转换(也许是最常见的优化)。


  

问题4:你我的功能实现许可证LCO,因此
  避免增加不必要的帧到调用栈?


是的,这不是问题。良好的工作和高兴你考虑这一点。

I have taken Problem #12 from Project Euler as a programming exercise and to compare my (surely not optimal) implementations in C, Python, Erlang and Haskell. In order to get some higher execution times, I search for the first triangle number with more than 1000 divisors instead of 500 as stated in the original problem.

The result is the following:

C:

lorenzo@enzo:~/erlang$ gcc -lm -o euler12.bin euler12.c
lorenzo@enzo:~/erlang$ time ./euler12.bin
842161320

real    0m11.074s
user    0m11.070s
sys 0m0.000s

python:

lorenzo@enzo:~/erlang$ time ./euler12.py 
842161320

real    1m16.632s
user    1m16.370s
sys 0m0.250s

python with pypy:

lorenzo@enzo:~/Downloads/pypy-c-jit-43780-b590cf6de419-linux64/bin$ time ./pypy /home/lorenzo/erlang/euler12.py 
842161320

real    0m13.082s
user    0m13.050s
sys 0m0.020s

erlang:

lorenzo@enzo:~/erlang$ erlc euler12.erl 
lorenzo@enzo:~/erlang$ time erl -s euler12 solve
Erlang R13B03 (erts-5.7.4) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.4  (abort with ^G)
1> 842161320

real    0m48.259s
user    0m48.070s
sys 0m0.020s

haskell:

lorenzo@enzo:~/erlang$ ghc euler12.hs -o euler12.hsx
[1 of 1] Compiling Main             ( euler12.hs, euler12.o )
Linking euler12.hsx ...
lorenzo@enzo:~/erlang$ time ./euler12.hsx 
842161320

real    2m37.326s
user    2m37.240s
sys 0m0.080s

Summary:

  • C: 100%
  • python: 692% (118% with pypy)
  • erlang: 436% (135% thanks to RichardC)
  • haskell: 1421%

I suppose that C has a big advantage as it uses long for the calculations and not arbitrary length integers as the other three. Also it doesn't need to load a runtime first (Do the others?).

Question 1: Do Erlang, Python and Haskell lose speed due to using arbitrary length integers or don't they as long as the values are less than MAXINT?

Question 2: Why is Haskell so slow? Is there a compiler flag that turns off the brakes or is it my implementation? (The latter is quite probable as Haskell is a book with seven seals to me.)

Question 3: Can you offer me some hints how to optimize these implementations without changing the way I determine the factors? Optimization in any way: nicer, faster, more "native" to the language.

EDIT:

Question 4: Do my functional implementations permit LCO (last call optimization, a.k.a tail recursion elimination) and hence avoid adding unnecessary frames onto the call stack?

I really tried to implement the same algorithm as similar as possible in the four languages, although I have to admit that my Haskell and Erlang knowledge is very limited.


Source codes used:

#include <stdio.h>
#include <math.h>

int factorCount (long n)
{
    double square = sqrt (n);
    int isquare = (int) square;
    int count = isquare == square ? -1 : 0;
    long candidate;
    for (candidate = 1; candidate <= isquare; candidate ++)
        if (0 == n % candidate) count += 2;
    return count;
}

int main ()
{
    long triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index ++;
        triangle += index;
    }
    printf ("%ld\n", triangle);
}


#! /usr/bin/env python3.2

import math

def factorCount (n):
    square = math.sqrt (n)
    isquare = int (square)
    count = -1 if isquare == square else 0
    for candidate in range (1, isquare + 1):
        if not n % candidate: count += 2
    return count

triangle = 1
index = 1
while factorCount (triangle) < 1001:
    index += 1
    triangle += index

print (triangle)


-module (euler12).
-compile (export_all).

factorCount (Number) -> factorCount (Number, math:sqrt (Number), 1, 0).

factorCount (_, Sqrt, Candidate, Count) when Candidate > Sqrt -> Count;

factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;

factorCount (Number, Sqrt, Candidate, Count) ->
    case Number rem Candidate of
        0 -> factorCount (Number, Sqrt, Candidate + 1, Count + 2);
        _ -> factorCount (Number, Sqrt, Candidate + 1, Count)
    end.

nextTriangle (Index, Triangle) ->
    Count = factorCount (Triangle),
    if
        Count > 1000 -> Triangle;
        true -> nextTriangle (Index + 1, Triangle + Index + 1)  
    end.

solve () ->
    io:format ("~p~n", [nextTriangle (1, 1) ] ),
    halt (0).


factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' number sqrt candidate count
    | fromIntegral candidate > sqrt = count
    | number `mod` candidate == 0 = factorCount' number sqrt (candidate + 1) (count + 2)
    | otherwise = factorCount' number sqrt (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

解决方案

Using GHC 7.0.3, gcc 4.4.6, Linux 2.6.29 on an x86_64 Core2 Duo (2.5GHz) machine, compiling using ghc -O2 -fllvm -fforce-recomp for Haskell and gcc -O3 -lm for C.

  • Your C routine runs in 8.4 seconds (faster than your run probably because of -O3)
  • The Haskell solution runs in 36 seconds (due to the -O2 flag)
  • Your factorCount' code isn't explicitly typed and defaulting to Integer (thanks to Daniel for correcting my misdiagnosis here!). Giving an explicit type signature (which is standard practice anyway) using Int and the time changes to 11.1 seconds
  • in factorCount' you have needlessly called fromIntegral. A fix results in no change though (the compiler is smart, lucky for you).
  • You used mod where rem is faster and sufficient. This changes the time to 8.5 seconds.
  • factorCount' is constantly applying two extra arguments that never change (number, sqrt). A worker/wrapper transformation gives us:

 $ time ./so
 842161320  

 real    0m7.954s  
 user    0m7.944s  
 sys     0m0.004s  

That's right, 7.95 seconds. Consistently half a second faster than the C solution. Without the -fllvm flag I'm still getting 8.182 seconds, so the NCG backend is doing well in this case too.

Conclusion: Haskell is awesome.

Resulting Code

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' :: Int -> Int -> Int -> Int -> Int
factorCount' number sqrt candidate0 count0 = go candidate0 count0
  where
  go candidate count
    | candidate > sqrt = count
    | number `rem` candidate == 0 = go (candidate + 1) (count + 2)
    | otherwise = go (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

EDIT: So now that we've explored that, lets address the questions

Question 1: Do erlang, python and haskell lose speed due to using arbitrary length integers or don't they as long as the values are less than MAXINT?

In Haskell, using Integer is slower than Int but how much slower depends on the computations performed. Luckily (for 64 bit machines) Int is sufficient. For portability sake you should probably rewrite my code to use Int64 or Word64 (C isn't the only language with a long).

Question 2: Why is haskell so slow? Is there a compiler flag that turns off the brakes or is it my implementation? (The latter is quite probable as haskell is a book with seven seals to me.)

Question 3: Can you offer me some hints how to optimize these implementations without changing the way I determine the factors? Optimization in any way: nicer, faster, more "native" to the language.

That was what I answered above. The answer was

  • 0) Use optimization via -O2
  • 1) Use fast (notably: unbox-able) types when possible
  • 2) rem not mod (a frequently forgotten optimization) and
  • 3) worker/wrapper transformation (perhaps the most common optimization).

Question 4: Do my functional implementations permit LCO and hence avoid adding unnecessary frames onto the call stack?

Yes, that wasn't the issue. Good work and glad you considered this.

这篇关于与项目欧拉速度对比:C VS的Python VS二郎VS哈斯克尔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆