欧拉项目14:与C和memoization相比的性能 [英] Project Euler 14: performance compared to C and memoization

查看:67
本文介绍了欧拉项目14:与C和memoization相比的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在处理项目euler问题14



我用一个编码不好的程序解决了这个问题,没有记忆,它用5秒钟的时间运行(见编辑)。

这里是:

  step ::(Integer,Int) - >整数 - > (Integer,Int)
step(i,m)n | nextValue> m =(n,nextValue)
|否则=(i,m)
其中nextValue = syr n 1

syr :: Integer - > Int - > Int
syr 1 acc = acc
syr x acc |甚至x = syr(x`div` 2)(acc + 1)
|否则= syr(3 * x + 1)(acc + 1)

p14 = foldl step(0,0)[500000..999999]

我的问题是关于与此问题有关的线索中的几个注释,其中针对如下节目提及的执行时间<1s(C代码,到代码的项目euler论坛用户 ix - 注意:我没有检查执行时间实际上是否被提及):

  #include< stdio.h> 


int main(int argc,char ** argv){
int longest = 0;
int terms = 0;
int i;
unsigned long j;
for(i = 1; i <= 1000000; i ++){
j = i;
int this_terms = 1;
while(j!= 1){
this_terms ++;
if(this_terms> terms){
terms = this_terms;
最长= i;
}
if(j%2 == 0){
j = j / 2;
} else {
j = 3 * j + 1;
}
}
}
printf(最长:%d(%d)\\\
,最长,期限);
返回0;
}

对我来说,这些程序在谈论算法时是相同的。

所以我想知道为什么会有这么大的差异?或者,我们的两个算法之间是否存在任何可以证明x6因子性能的差异?

顺便说一句,我目前正试图用memoization来实现这个算法,有一种失去了我的感觉,用命令式语言来实现会更容易(我不操纵单子,所以我不能使用这个范例)。所以如果你有一个适合初学者学习记忆的好教程,我会很高兴(我遇到的那些不够详细或者不在我的联盟中)。



注意:我通过Prolog来到声明范例,并且仍然处于发现Haskell的早期阶段,因此我可能会错过重要的事情。注意2:关于任何常规建议我的代码是受欢迎的。



编辑:感谢delnan的帮助,我编译了程序,它现在运行5秒,所以我主要看(即使关于现有的x6缺口的想法仍然受欢迎)。

解决方案

使用优化编译后,与C程序仍有几处不同之处




  • 您使用 div ,而C程序使用机器划分(截断)[但任何自我尊重的C编译器将其转换为一个移位,这使得它更快],那个wou在Haskell中,ld是 quot ;在这里将运行时间减少了大约15%。
    C程序使用固定宽度的64位(甚至是32位,但只有运气才能得到正确答案,因为一些中间值超过32位范围)整数,Haskell程序使用任意精度整数 s。如果您的GHC(除Windows以外的64位操作系统)中有64位 Int s,则将整数替换为内部。这在这里将运行时间缩短了约3倍。如果您使用的是32位系统,那么您运气不好,GHC不会在那里使用本地64位指令,这些操作将作为C调用实现,但仍然相当慢。



对于memoisation,您可以将它外包给hackage上的一个memoisation软件包,我记得的唯一一个是 data-memocombinators ,但还有其他的。或者你可以自己做,例如保留以前计算的值的映射 - 在 State monad中最好,

  import Control.Monad.State.Strict 
将合格的Data.Map导入为Map
导入Data.Map(Map,singleton)

type Memo = Map Integer Int

syr :: Integer - > State Memo Int
syr n = do
mb< - gets(Map.lookup n)
case mb of
Just l - >返回l
Nothing - > b
让m =如果偶数n则n`2`else 3 * n + 1
l < - syr m
let l'= l + 1
modify(Map .insert n l')
return l'

solve :: Integer - > Int - >整数 - >状态备忘(Integer,Int)
解决maxi len start
| len> 1000000 = return(maxi,len)
|否则= do
l< - syr start
如果len< l
然后求解start l(start + 1)
else解决maxi len(start + 1)

p14 ::(Integer,Int)
p14 = evalState (解决0 0 500000)(单身人士1 1)

但这可能不会获得太多即使你已经添加了必要的严格性)。麻烦的是,在 Map 中查找并不便宜,插入代价相对较高。另一种方法是保持查找的可变数组。代码变得更加复杂,因为你必须为要缓存的值选择一个合理的上限(应该不比起始值的边界大得多)并且处理位于memoised范围之外的部分序列。但是数组查找和写入速度很快。如果你有64位 Int s,下面的代码运行速度非常快,在这里它需要0.03s的限制为100万,0.33s的限制为1000万,相应的(尽我所能)C代码运行在0.018和。 0.2s。

  module Main(main)where 

import System.Environment(getArgs)
import Data.Array.ST
import Data.Array.Base
import Control.Monad.ST
import Data.Bits
import Data.Int

main :: IO()
main = do
args< - getArgs
let bd = case $ args $ b $:_ - >阅读
_ - > 100000
print $ collMax bd

next :: Int - > Int
next n
| n& ;. 1 == 0 = n`unsafeShiftR` 1
|否则= 3 * n + 1

collMax :: Int - > (Int,Int16)
collMax upper = runST $ do
arr < - newArray(0,upper)0 :: ST s(STUArray s Int Int16)
let go lm
|上部< m = go(l + 1)$ next m
|否则=做
l'< - unsafeRead arr m
案例l'的
0 - > (1 + 1)
返回(l + l)
- - >
l''< - go 1 $ next m
unsafeWrite arr m返回(l + l'-1)
收集mi ml i
|上部<我=返回(mi,ml)
|否则=做
l< - 去1 1
如果l> ml
然后收集il(i + 1)
其他收集mi ml(i + 1)
unsafeWrite arr 1 1
collect 1 1 2


I'm currently working on project euler problem 14.

I solved it using a poorly coded program, without memoization, that took 386 5 seconds to run (see edit).

Here it is:

step :: (Integer, Int) -> Integer -> (Integer, Int)
step (i, m) n   | nextValue > m         = (n, nextValue)
                | otherwise             = (i, m)
                where nextValue = syr n 1

syr :: Integer -> Int -> Int
syr 1 acc   = acc
syr x acc   | even x    = syr (x `div` 2) (acc + 1)
            | otherwise = syr (3 * x + 1) (acc + 1)

p14 = foldl step (0, 0) [500000..999999]

My question is about several comments in the thread related to this problem, where were mentionned execution times of <1 s for programs as follow (C code, credits to the project euler forum user ix for the code -- note: I didn't check that the execution time is in fact as mentionned):

#include <stdio.h>


int main(int argc, char **argv) {
    int longest = 0;
    int terms = 0;
    int i;
    unsigned long j;
    for (i = 1; i <= 1000000; i++) {
        j = i;
        int this_terms = 1;
        while (j != 1) {
            this_terms++;
            if (this_terms > terms) {
                terms = this_terms;
                longest = i;
            }
            if (j % 2 == 0) {
                j = j / 2;
            } else {
                j = 3 * j + 1;
            }
        }
    }
    printf("longest: %d (%d)\n", longest, terms);
    return 0;
}

To me, those programs are kind of the same, when talking about the algorithm.

So I wonder why there is such a big difference? Or is there any fondamental difference between our two algorithms that can justify a x6 factor in performance?

BTW, I'm currently trying to implement this algorithm with memoization, but am kind of lost as to me, it's way easier to implement in an imperative language (and I don't manipulate monads yet so I can't use this paradigm). So if you have any good tutorial that fits a beginner to learn memoization, I'll be glad (the ones I encountered were not detailed enough or out of my league).

Note: I came to declarative paradigm through Prolog and am still in the very early process of discovering Haskell, so I might miss important things.

Note2: any general advice about my code is welcome.

EDIT: thanks to delnan's help, I compiled the program and it now runs in 5 seconds, so I mainly look for hints on memoization now (even if ideas about the existing x6 gap are still welcome).

解决方案

After having compiled it with optimisations, there are still several differences to the C programme

  • you use div, while the C programme uses machine division (which truncates) [but any self-respecting C compiler transforms that into a shift, so that makes it yet faster], that would be quot in Haskell; that reduced the run time by some 15% here.
  • the C programme uses fixed-width 64-bit (or even 32-bit, but then it's just luck that it gets the correct answer, since some intermediate values exceed 32-bit range) integers, the Haskell programme uses arbitrary precision Integers. If you have 64-bit Ints in your GHC (64-bit OS other than Windows), replace Integer with Int. That reduced the run time by a factor of about 3 here. If you're on a 32-bit system, you're out of luck, GHC doesn't use native 64-bit instructions there, these operations are implemented as C calls, that's still rather slow.

For the memoisation, you can outsource it to one of the memoisation packages on hackage, the only one that I remember is data-memocombinators, but there are others. Or you can do it yourself, for example keeping a map of previously calculated values - that would work best in the State monad,

import Control.Monad.State.Strict
import qualified Data.Map as Map
import Data.Map (Map, singleton)

type Memo = Map Integer Int

syr :: Integer -> State Memo Int
syr n = do
    mb <- gets (Map.lookup n)
    case mb of
      Just l -> return l
      Nothing -> do
          let m = if even n then n `quot` 2 else 3*n+1
          l <- syr m
          let l' = l+1
          modify (Map.insert n l')
          return l'

solve :: Integer -> Int -> Integer -> State Memo (Integer,Int)
solve maxi len start
    | len > 1000000 = return (maxi,len)
    | otherwise = do
         l <- syr start
         if len < l
             then solve start l (start+1)
             else solve maxi len (start+1)

p14 :: (Integer,Int)
p14 = evalState (solve 0 0 500000) (singleton 1 1)

but that will probably not gain too much (not even when you've added the necessary strictness). The trouble is that a lookup in a Map is not too cheap and an insertion is relatively expensive.

Another method is to keep a mutable array for the lookup. The code becomes more complicated, since you have to choose a reasonable upper bound for the values to cache (should be not much larger than the bound for the starting values) and deal with the parts of the sequences falling outside the memoised range. But an array lookup and write are fast. If you have 64-bit Ints, the below code runs pretty fast, here it takes 0.03s for a limit of 1 million, and 0.33s for a limit of 10 million, the corresponding (as closely as I reasonably could) C code runs in 0.018 resp. 0.2s.

module Main (main) where

import System.Environment (getArgs)
import Data.Array.ST
import Data.Array.Base
import Control.Monad.ST
import Data.Bits
import Data.Int

main :: IO ()
main = do
    args <- getArgs
    let bd = case args of
               a:_ -> read a
               _   -> 100000
    print $ collMax bd

next :: Int -> Int
next n
    | n .&. 1 == 0  = n `unsafeShiftR` 1
    | otherwise     = 3*n + 1

collMax :: Int -> (Int,Int16)
collMax upper = runST $ do
    arr <- newArray (0,upper) 0 :: ST s (STUArray s Int Int16)
    let go l m
            | upper < m = go (l+1) $ next m
            | otherwise = do
                l' <- unsafeRead arr m
                case l' of
                  0 -> do
                      l'' <- go 1 $ next m
                      unsafeWrite arr m (l'' + 1)
                      return (l+l'')
                  _ -> return (l+l'-1)
        collect mi ml i
            | upper < i = return (mi, ml)
            | otherwise = do
                l <- go 1 i
                if l > ml
                  then collect i l (i+1)
                  else collect mi ml (i+1)
    unsafeWrite arr 1 1
    collect 1 1 2

这篇关于欧拉项目14:与C和memoization相比的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆