如何优化这个Haskell代码在次线性时间中总结质数? [英] How to optimize this Haskell code summing up the primes in sublinear time?
问题描述
来自用不可变 Data.Map
替换为 Data.HashTable.IO.BasicHashTable
从 hashtables
包,但我可能做的不好,因为对于很小的 n = 30 它已经花费了很长时间,大约10秒。有什么不对?
更新18.06.17:
对HashTable性能问题感到好奇是一个很好的解读。我拿着Sherh的 code 使用mutable Data.HashTable.ST.Linear
,但已删除 Data.Judy
代替。它在1.1秒内运行,仍然相对较慢。
我已经做了一些小的改进,因此它运行在 3.4-3.5
我的机器上的秒数。
使用 IntMap.Strict
帮了很大忙。除此之外,我只是手动执行一些 ghc
优化以确保。使Haskell代码更接近您的链接中的Python代码。作为下一步,你可以尝试使用一些可变的 HashMap
。但我不确定... IntMap
不能比一些可变容器快得多,因为它是不可变的。尽管我仍然对效率感到惊讶。
以下是代码:
导入Data.List(foldl')
pre>
导入Data.IntMap.Strict(IntMap,(!))
导入限定的Data.IntMap.Strict作为IntMap
p :: Int - > (IntMap.fromList [(i,i *(i + 1)`div`2-1)| i -v)2r vs) n
其中vs = [n`div` i | i < - [1..r]] ++ [n',n' - 1..1]
r = floor(sqrt(fromIntegral n):: Double)
n'= n` div` r - 1
sieve :: IntMap Int - > Int - > Int - > [Int] - > IntMap Int
筛选m'p'r vs = go m'p'
其中
go m p | p> r = m
|米! p>米! (p - 1)= go(更新m vs p)(p + 1)
|否则= go m(p + 1)
update :: IntMap Int - > [Int] - > Int - > IntMap Int
update s vs p = foldl'decrease s(takeWhile(> = p2)vs)
其中
sp = s! (p - 1)
p2 = p * p
sumOfSieved v = p *(s!(v` div` p) - sp)
decrease mv = IntMap.adjust(subtract $ sumOfSieved v)vm
main :: IO()
main = print $ p $ 2 * 10 ^(9 :: Int)
更新:
使用mutable
/ code>我在Haskell上用〜5.5秒
/ 062b4322a76cb74ecf3edfa1fef08237rel =noreferrer>这个实现。
另外,我在几个地方使用了unboxed向量而不是列表。
>线性
散列似乎是最快的。我认为这可以做得更快。我注意到sse42
选项在hasthables
包中。不知道我是否设法正确设置了它,但即使没有,它也运行得非常快。
我已经设法通过删除@Krom(使用我的代码+他的地图)来更快地获得
3x
的最佳解决方案朱迪hashmap在所有。相反,只使用普通数组。如果你注意到S
hashmap的键是从1
到<$的顺序,你可以想出同样的想法c $ c> n'或n div i
用于i
from1
到r
。因此,我们可以将这样的HashMap表示为根据搜索键在数组中进行查找的两个数组。
$ b我的代码+ Judy HashMap
$ time ./judy
95673602693282040
real 0m0.590s
user 0m0.588s
sys 0m0.000s
我的code + my sparse map
$ time ./sparse
95673602693282040
real 0m0.203s
user 0m0.196s
sys 0m0.004s
如果不是使用
IOUArray
已经生成的向量而使用Vector
库,并且readArray
被替换为unsafeRead
。但我不认为这应该做,如果你只是没有真正有兴趣尽可能优化。
与这个解决方案比较是作弊行为,并不公平。我期望在Python和C ++中实现的相同想法会更快。但是使用封闭hashmap的@Krom解决方案已经在作弊,因为它使用自定义数据结构而不是标准结构。至少你可以看到 Haskell中的标准和最流行的散列映射并不那么快。使用更好的算法和更好的临时数据结构可以更好地解决此类问题。
Problem 10 from Project Euler is to find the sum of all the primes below given n.
I solved it simply by summing up the primes generated by the sieve of Eratosthenes. Then I came across much more efficient solution by Lucy_Hedgehog (sub-linear!).
For n = 2⋅10^9:
Python code (from the quote above) runs in 1.2 seconds in Python 2.7.3.
C++ code (mine) runs in about 0.3 seconds (compiled with g++ 4.8.4).
I re-implemented the same algorithm in Haskell, since I'm learning it:
import Data.List import Data.Map (Map, (!)) import qualified Data.Map as Map problem10 :: Integer -> Integer problem10 n = (sieve (Map.fromList [(i, i * (i + 1) `div` 2 - 1) | i <- vs]) 2 r vs) ! n where vs = [n `div` i | i <- [1..r]] ++ reverse [1..n `div` r - 1] r = floor (sqrt (fromIntegral n)) sieve :: Map Integer Integer -> Integer -> Integer -> [Integer] -> Map Integer Integer sieve m p r vs | p > r = m | otherwise = sieve (if m ! p > m ! (p - 1) then update m vs p else m) (p + 1) r vs update :: Map Integer Integer -> [Integer] -> Integer -> Map Integer Integer update m vs p = foldl' decrease m (map (\v -> (v, sumOfSieved m v p)) (takeWhile (>= p*p) vs)) decrease :: Map Integer Integer -> (Integer, Integer) -> Map Integer Integer decrease m (k, v) = Map.insertWith (flip (-)) k v m sumOfSieved :: Map Integer Integer -> Integer -> Integer -> Integer sumOfSieved m v p = p * (m ! (v `div` p) - m ! (p - 1)) main = print $ problem10 $ 2*10^9
I compiled it with
ghc -O2 10.hs
and run withtime ./10
.It gives the correct answer, but takes about 7 seconds.
I compiled it with
ghc -prof -fprof-auto -rtsopts 10
and run with./10 +RTS -p -h
.10.prof shows that
decrease
takes 52.2% time and 67.5% allocations.After running
hp2ps 10.hp
I got such heap profile:Again looks like
decrease
takes most of the heap. GHC version 7.6.3.How would you optimize run time of this Haskell code?
Update 13.06.17:
I tried replacing immutable
Data.Map
with mutableData.HashTable.IO.BasicHashTable
from thehashtables
package, but I'm probably doing something bad, since for tiny n = 30 it already takes too long, about 10 seconds. What's wrong?Update 18.06.17:
Curious about the HashTable performance issues is a good read. I took Sherh's code using mutable
Data.HashTable.ST.Linear
, but droppedData.Judy
in instead. It runs in 1.1 seconds, still relatively slow.解决方案I've done some small improvements so it runs in
3.4-3.5
seconds on my machine. UsingIntMap.Strict
helped a lot. Other than that I just manually performed someghc
optimizations just to be sure. And make Haskell code more close to Python code from your link. As a next step you could try to use some mutableHashMap
. But I'm not sure...IntMap
can't be much faster than some mutable container because it's an immutable one. Though I'm still surprised about it's efficiency. I hope this can be implemented faster.Here is the code:
import Data.List (foldl') import Data.IntMap.Strict (IntMap, (!)) import qualified Data.IntMap.Strict as IntMap p :: Int -> Int p n = (sieve (IntMap.fromList [(i, i * (i + 1) `div` 2 - 1) | i <- vs]) 2 r vs) ! n where vs = [n `div` i | i <- [1..r]] ++ [n', n' - 1 .. 1] r = floor (sqrt (fromIntegral n) :: Double) n' = n `div` r - 1 sieve :: IntMap Int -> Int -> Int -> [Int] -> IntMap Int sieve m' p' r vs = go m' p' where go m p | p > r = m | m ! p > m ! (p - 1) = go (update m vs p) (p + 1) | otherwise = go m (p + 1) update :: IntMap Int -> [Int] -> Int -> IntMap Int update s vs p = foldl' decrease s (takeWhile (>= p2) vs) where sp = s ! (p - 1) p2 = p * p sumOfSieved v = p * (s ! (v `div` p) - sp) decrease m v = IntMap.adjust (subtract $ sumOfSieved v) v m main :: IO () main = print $ p $ 2*10^(9 :: Int)
UPDATE:
Using mutable
hashtables
I've managed to make performance up to~5.5sec
on Haskell with this implementation.Also, I used unboxed vectors instead of lists in several places.
Linear
hashing seems to be the fastest. I think this can be done even faster. I noticedsse42
option inhasthables
package. Not sure I've managed to set it correctly but even without it runs that fast.UPDATE 2 (19.06.2017)
I've managed to make it
3x
faster then best solution from @Krom (using my code + his map) by dropping judy hashmap at all. Instead just plain arrays are used. You can come up with the same idea if you notice that keys forS
hashmap are either sequence from1
ton'
orn div i
fori
from1
tor
. So we can represent such HashMap as two arrays making lookups in array depending on searching key.My code + Judy HashMap
$ time ./judy 95673602693282040 real 0m0.590s user 0m0.588s sys 0m0.000s
My code + my sparse map
$ time ./sparse 95673602693282040 real 0m0.203s user 0m0.196s sys 0m0.004s
This can be done even faster if instead of
IOUArray
already generated vectors andVector
library is used andreadArray
is replaced byunsafeRead
. But I don't think this should be done if only you're not really interested in optimizing this as much as possible.Comparison with this solution is cheating and is not fair. I expect same ideas implemented in Python and C++ will be even faster. But @Krom solution with closed hashmap is already cheating because it uses custom data structure instead of standard one. At least you can see that standard and most popular hash maps in Haskell are not that fast. Using better algorithms and better ad-hoc data structures can be better for such problems.
这篇关于如何优化这个Haskell代码在次线性时间中总结质数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!