如何优化这个Haskell代码在次线性时间中总结质数? [英] How to optimize this Haskell code summing up the primes in sublinear time?

查看:206
本文介绍了如何优化这个Haskell代码在次线性时间中总结质数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自用不可变 Data.Map 替换为 Data.HashTable.IO.BasicHashTable hashtables 包,但我可能做的不好,因为对于很小的 n = 30 它已经花费了很长时间,大约10秒。有什么不对?



更新18.06.17:

对HashTable性能问题感到好奇是一个很好的解读。我拿着Sherh的 code 使用mutable Data.HashTable.ST.Linear ,但已删除 Data.Judy 代替。它在1.1秒内运行,仍然相对较慢。

解决方案

我已经做了一些小的改进,因此它运行在 3.4-3.5 我的机器上的秒数。
使用 IntMap.Strict 帮了很大忙。除此之外,我只是手动执行一些 ghc 优化以确保。使Haskell代码更接近您的链接中的Python代码。作为下一步,你可以尝试使用一些可变的 HashMap 。但我不确定... IntMap 不能比一些可变容器快得多,因为它是不可变的。尽管我仍然对效率感到惊讶。

以下是代码:

 导入Data.List(foldl')
导入Data.IntMap.Strict(IntMap,(!))
导入限定的Data.IntMap.Strict作为IntMap

p :: Int - > (IntMap.fromList [(i,i *(i + 1)`div`2-1)| i -v)2r vs) n
其中vs = [n`div` i | i < - [1..r]] ++ [n',n' - 1..1]
r = floor(sqrt(fromIntegral n):: Double)
n'= n` div` r - 1

sieve :: IntMap Int - > Int - > Int - > [Int] - > IntMap Int
筛选m'p'r vs = go m'p'
其中
go m p | p> r = m
|米! p>米! (p - 1)= go(更新m vs p)(p + 1)
|否则= go m(p + 1)

update :: IntMap Int - > [Int] - > Int - > IntMap Int
update s vs p = foldl'decrease s(takeWhile(> = p2)vs)
其中
sp = s! (p - 1)
p2 = p * p
sumOfSieved v = p *(s!(v` div` p) - sp)
decrease mv = IntMap.adjust(subtract $ sumOfSieved v)vm

main :: IO()
main = print $ p $ 2 * 10 ^(9 :: Int)
pre>

更新:



使用mutable / code>我在Haskell上用〜5.5秒 / 062b4322a76cb74ecf3edfa1fef08237rel =noreferrer>这个实现

另外,我在几个地方使用了unboxed向量而不是列表。 线性散列似乎是最快的。我认为这可以做得更快。我注意到 sse42 选项 hasthables 包中。不知道我是否设法正确设置了它,但即使没有,它也运行得非常快。

>

我已经设法通过删除@Krom(使用我的代码+他的地图)来更快地获得 3x 的最佳解决方案朱迪hashmap在所有。相反,只使用普通数组。如果你注意到 S hashmap的键是从 1 到<$的顺序,你可以想出同样的想法c $ c> n'或 n div i 用于 i from 1 r 。因此,我们可以将这样的HashMap表示为根据搜索键在数组中进行查找的两个数组。


$ b

我的代码+ Judy HashMap

  $ time ./judy 
95673602693282040

real 0m0.590s
user 0m0.588s
sys 0m0.000s

我的code + my sparse map

  $ time ./sparse 
95673602693282040

real 0m0.203s
user 0m0.196s
sys 0m0.004s

如果不是使用 IOUArray 已经生成的向量而使用 Vector 库,并且 readArray 被替换为 unsafeRead 。但我不认为这应该做,如果你只是没有真正有兴趣尽可能优化。

与这个解决方案比较是作弊行为,并不公平。我期望在Python和C ++中实现的相同想法会更快。但是使用封闭hashmap的@Krom解决方案已经在作弊,因为它使用自定义数据结构而不是标准结构。至少你可以看到 Haskell中的标准和最流行的散列映射并不那么快。使用更好的算法和更好的临时数据结构可以更好地解决此类问题。

这是结果代码。


Problem 10 from Project Euler is to find the sum of all the primes below given n.

I solved it simply by summing up the primes generated by the sieve of Eratosthenes. Then I came across much more efficient solution by Lucy_Hedgehog (sub-linear!).

For n = 2⋅10^9:

  • Python code (from the quote above) runs in 1.2 seconds in Python 2.7.3.

  • C++ code (mine) runs in about 0.3 seconds (compiled with g++ 4.8.4).

I re-implemented the same algorithm in Haskell, since I'm learning it:

import Data.List

import Data.Map (Map, (!))
import qualified Data.Map as Map

problem10 :: Integer -> Integer
problem10 n = (sieve (Map.fromList [(i, i * (i + 1) `div` 2 - 1) | i <- vs]) 2 r vs) ! n
              where vs = [n `div` i | i <- [1..r]] ++ reverse [1..n `div` r - 1]
                    r  = floor (sqrt (fromIntegral n))

sieve :: Map Integer Integer -> Integer -> Integer -> [Integer] -> Map Integer Integer
sieve m p r vs | p > r     = m
               | otherwise = sieve (if m ! p > m ! (p - 1) then update m vs p else m) (p + 1) r vs

update :: Map Integer Integer -> [Integer] -> Integer -> Map Integer Integer
update m vs p = foldl' decrease m (map (\v -> (v, sumOfSieved m v p)) (takeWhile (>= p*p) vs))

decrease :: Map Integer Integer -> (Integer, Integer) -> Map Integer Integer
decrease m (k, v) = Map.insertWith (flip (-)) k v m

sumOfSieved :: Map Integer Integer -> Integer -> Integer -> Integer
sumOfSieved m v p = p * (m ! (v `div` p) - m ! (p - 1))

main = print $ problem10 $ 2*10^9

I compiled it with ghc -O2 10.hs and run with time ./10.

It gives the correct answer, but takes about 7 seconds.

I compiled it with ghc -prof -fprof-auto -rtsopts 10 and run with ./10 +RTS -p -h.

10.prof shows that decrease takes 52.2% time and 67.5% allocations.

After running hp2ps 10.hp I got such heap profile:

Again looks like decrease takes most of the heap. GHC version 7.6.3.

How would you optimize run time of this Haskell code?


Update 13.06.17:

I tried replacing immutable Data.Map with mutable Data.HashTable.IO.BasicHashTable from the hashtables package, but I'm probably doing something bad, since for tiny n = 30 it already takes too long, about 10 seconds. What's wrong?

Update 18.06.17:

Curious about the HashTable performance issues is a good read. I took Sherh's code using mutable Data.HashTable.ST.Linear, but dropped Data.Judy in instead. It runs in 1.1 seconds, still relatively slow.

解决方案

I've done some small improvements so it runs in 3.4-3.5 seconds on my machine. Using IntMap.Strict helped a lot. Other than that I just manually performed some ghc optimizations just to be sure. And make Haskell code more close to Python code from your link. As a next step you could try to use some mutable HashMap. But I'm not sure... IntMap can't be much faster than some mutable container because it's an immutable one. Though I'm still surprised about it's efficiency. I hope this can be implemented faster.

Here is the code:

import Data.List (foldl')
import Data.IntMap.Strict (IntMap, (!))
import qualified Data.IntMap.Strict as IntMap

p :: Int -> Int
p n = (sieve (IntMap.fromList [(i, i * (i + 1) `div` 2 - 1) | i <- vs]) 2 r vs) ! n
               where vs = [n `div` i | i <- [1..r]] ++ [n', n' - 1 .. 1]
                     r  = floor (sqrt (fromIntegral n) :: Double)
                     n' = n `div` r - 1

sieve :: IntMap Int -> Int -> Int -> [Int] -> IntMap Int
sieve m' p' r vs = go m' p'
  where
    go m p | p > r               = m
           | m ! p > m ! (p - 1) = go (update m vs p) (p + 1)
           | otherwise           = go m (p + 1)

update :: IntMap Int -> [Int] -> Int -> IntMap Int
update s vs p = foldl' decrease s (takeWhile (>= p2) vs)
  where
    sp = s ! (p - 1)
    p2 = p * p
    sumOfSieved v = p * (s ! (v `div` p) - sp)
    decrease m  v = IntMap.adjust (subtract $ sumOfSieved v) v m

main :: IO ()
main = print $ p $ 2*10^(9 :: Int) 

UPDATE:

Using mutable hashtables I've managed to make performance up to ~5.5sec on Haskell with this implementation.

Also, I used unboxed vectors instead of lists in several places. Linear hashing seems to be the fastest. I think this can be done even faster. I noticed sse42 option in hasthables package. Not sure I've managed to set it correctly but even without it runs that fast.

UPDATE 2 (19.06.2017)

I've managed to make it 3x faster then best solution from @Krom (using my code + his map) by dropping judy hashmap at all. Instead just plain arrays are used. You can come up with the same idea if you notice that keys for S hashmap are either sequence from 1 to n' or n div i for i from 1 to r. So we can represent such HashMap as two arrays making lookups in array depending on searching key.

My code + Judy HashMap

$ time ./judy
95673602693282040

real    0m0.590s
user    0m0.588s
sys     0m0.000s

My code + my sparse map

$ time ./sparse
95673602693282040

real    0m0.203s
user    0m0.196s
sys     0m0.004s

This can be done even faster if instead of IOUArray already generated vectors and Vector library is used and readArray is replaced by unsafeRead. But I don't think this should be done if only you're not really interested in optimizing this as much as possible.

Comparison with this solution is cheating and is not fair. I expect same ideas implemented in Python and C++ will be even faster. But @Krom solution with closed hashmap is already cheating because it uses custom data structure instead of standard one. At least you can see that standard and most popular hash maps in Haskell are not that fast. Using better algorithms and better ad-hoc data structures can be better for such problems.

Here's resulting code.

这篇关于如何优化这个Haskell代码在次线性时间中总结质数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆