GHC优化:Collatz猜想 [英] GHC Optimization: Collatz conjecture
问题描述
我编写了欧拉项目挑战赛14 的代码,在 Haskell 和 C ++ (ideone链接)。他们都记得他们之前在数组中做过的任何计算。
分别使用 ghc -O2
和 g ++ -O3
,C ++的运行速度比Haskell版本快10-15倍。
虽然我明白Haskell版本可能运行速度较慢,而Haskell是一种更好的语言,如果知道我可以对Haskell版本进行一些代码更改以使其运行速度更快(理想情况下是C ++版本的2或3倍),那将是很好的做法?
Haskell代码在这里:
import Data.Array
import Data.Word
import Data.List
collatz_array =
let
upperbound = 1000000
a = array(1,upperbound)[(i :: Word64 ,fi :: Int)| i< - [1..upperbound]]
f i = i`seq`
let
check_f i = i`seq` if i <= upperbound then a! (if == 1)then 0 else(check_f((if(even i)then i else 3 * i + 1)`div` 2))+ 1 $ b if $ if $
$ b main =
putStrLn $ show $
foldl1'(\(x1,x2)(y1,y2) - > if(x2> = y2)then(x1,x2)else(y1,y2))$! (assocs collatz_array)
编辑:
我现在也做了一个使用unboxed可变数组的版本。它仍然比C ++版本慢5倍,但是是一个显着的改进。代码是在ideone 这里。
I' d想知道对可变数组版本的改进,使其更接近C ++版本。
数组)代码:
- 使用折叠来查找最大链长,因为数组必须转换为关联列表,这需要时间和C ++版本不需要的分配。
- 使用
甚至
和div
用于测试resp除以2.这些很慢。 g ++将操作优化为更快的位操作(至少在更快的平台上),但GHC不会进行这些低级优化(暂时),所以暂时必须手工完成。
- 您可以使用
readArray
和writeArray
。在C ++代码中没有完成的额外边界检查也需要时间,一旦其他问题得到处理,这相当于运行时间的很大一部分(我的框中约为25%),因为已经完成了在算法中有很多读写操作。
在执行过程中加入了这个元素,我得到了
import Data.Array.ST
import Data.Array.Base
import Control.Monad.ST
import Data.Bits
collatz_array :: ST s(STUArray s Int Int)
collatz_array = do
let upper = 10000000
arr < - newArray(0,upper)0
unsafeWrite arr 2 1
让我检查我
|上部<我=返回arr
|我和&。 1 == 0 = do
l< - unsafeRead arr(i`shiftR` 1)
unsafeWrite arr i(l + 1)
check(i + 1)
|否则= do
let j =(3 * i + 1)`shiftR` 1
find k l
|上部< k = find(next k)$! l + 1
| k< i = do
m< - unsafeRead arr k
return(m + 1)
|否则=做
m< - unsafeRead arr k
如果m == 0
然后做
n< - find(next k)1
unsafeWrite arr kn
return(n + l)
else return(m + l)
其中
next h
| h。& ;. 1 == 0 = h`shiftR` 1
|否则=(3 * h + 1)`shiftR` 1
l < - 找到j 1
unsafeWrite arr il
check(i + 1)
check 3
collatz_max :: ST s(Int,Int)
collatz_max = do
car< - collatz_array
(_,upper)< - getBounds car
let find wmi
|上部<我= return(w,m)
|否则= do
l < - unsafeRead car i
if m< l
然后找到il(i + 1)
else找到wm(i + 1)
find 1 0 2
main :: IO()
main = print(runST collatz_max)
以及时间(均为1000万):
$ time ./cccoll
8400511 429
real 0m0.210s
user 0m0 .200s
sys 0m0.009s
$ time ./stcoll
(8400511,429)
real 0m0.341s
user 0m0.307s
sys 0m0.033s
看起来不错。
重要说明:该代码仅适用于64位GHC(因此,特别是在Windows上,您需要ghc-7.6.1或更高版本,先前的GHC为32即使在64位Windows上也是如此),因为中间链元素超过了32位范围。在32位系统上,必须使用 Integer
或64位整数类型( Int64
或<$因为原始的64位操作(算术和移位)被实现为对32位GHC中的C函数的外部调用,所以以严格的性能成本执行后续链(c $ c> Word64 ) fast 外部调用,但仍然比直接机器操作慢得多)。
I've written code for the Project Euler's Challenge 14, in both Haskell and C++ (ideone links). They both remember any calculations they have previously done in an array.
Using ghc -O2
and g++ -O3
respectively, the C++ runs 10-15 times faster than the Haskell version.
Whilst I understand the Haskell version may run slower, and that Haskell is a nicer language to write in, it would be nice to know some code changes I can make to the Haskell version to make it run faster (ideally within a factor of 2 or 3 of the C++ version)?
Haskell code is here:
import Data.Array
import Data.Word
import Data.List
collatz_array =
let
upperbound = 1000000
a = array (1, upperbound) [(i :: Word64, f i :: Int) | i <- [1..upperbound]]
f i = i `seq`
let
check_f i = i `seq` if i <= upperbound then a ! i else f i
in
if (i == 1) then 0 else (check_f ((if (even i) then i else 3 * i + 1) `div` 2)) + 1
in a
main =
putStrLn $ show $
foldl1' (\(x1,x2) (y1,y2) -> if (x2 >= y2) then (x1, x2) else (y1, y2)) $! (assocs collatz_array)
Edit:
I've now also done a version using unboxed mutable arrays. It is still 5 times slower than the C++ version, but a significant improvement. The code is on ideone here.
I'd like to know improvements to the mutable array version which bring it closer to the C++ version.
Some problems with your (mutable array) code:
- You use a fold to find the maximal chain length, for that the array has to be converted to an association list, that takes time and allocation the C++ version doesn't need.
- You use
even
anddiv
for testing resp dividing by 2. These are slow. g++ optimises both operations to the faster bit operations (on platforms where that is supposedly faster, at least), but GHC doesn't do these low-level optimisations (yet), so for the time being, they have to be done by hand. - You use
readArray
andwriteArray
. The extra bounds-checking that isn't done in the C++ code also takes time, once the other problems are dealt with, that amounts to a significant portion of the running time (ca. 25% on my box), since there are done a lot of reads and writes in the algorithm.
Incorporating that into the implementation, I get
import Data.Array.ST
import Data.Array.Base
import Control.Monad.ST
import Data.Bits
collatz_array :: ST s (STUArray s Int Int)
collatz_array = do
let upper = 10000000
arr <- newArray (0,upper) 0
unsafeWrite arr 2 1
let check i
| upper < i = return arr
| i .&. 1 == 0 = do
l <- unsafeRead arr (i `shiftR` 1)
unsafeWrite arr i (l+1)
check (i+1)
| otherwise = do
let j = (3*i+1) `shiftR` 1
find k l
| upper < k = find (next k) $! l+1
| k < i = do
m <- unsafeRead arr k
return (m+l)
| otherwise = do
m <- unsafeRead arr k
if m == 0
then do
n <- find (next k) 1
unsafeWrite arr k n
return (n+l)
else return (m+l)
where
next h
| h .&. 1 == 0 = h `shiftR` 1
| otherwise = (3*h+1) `shiftR` 1
l <- find j 1
unsafeWrite arr i l
check (i+1)
check 3
collatz_max :: ST s (Int,Int)
collatz_max = do
car <- collatz_array
(_,upper) <- getBounds car
let find w m i
| upper < i = return (w,m)
| otherwise = do
l <- unsafeRead car i
if m < l
then find i l (i+1)
else find w m (i+1)
find 1 0 2
main :: IO ()
main = print (runST collatz_max)
And the timings (both for 10 million):
$ time ./cccoll
8400511 429
real 0m0.210s
user 0m0.200s
sys 0m0.009s
$ time ./stcoll
(8400511,429)
real 0m0.341s
user 0m0.307s
sys 0m0.033s
which doesn't look too bad.
Important note: That code only works on 64-bit GHC (so, in particular, on Windows, you need ghc-7.6.1 or later, previous GHCs were 32-bit even on 64-bit Windows) since intermediate chain elements exceed 32-bit range. On 32-bit systems, one would have to use Integer
or a 64-bit integer type (Int64
or Word64
) for following the chains, at a drastic performance cost, since the primitive 64-bit operations (arithmetic and shifts) are implemented as foreign calls to C functions in 32-bit GHCs (fast foreign calls, but still much slower than direct machine ops).
这篇关于GHC优化:Collatz猜想的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!