在Haskell中快速排序 [英] Fast sorting in Haskell
问题描述
阅读Stack Overflow问题 使用向量来改善Haskell的性能 在Haskell中放置 quicksort ,我为自己设定了两个目标:
$
-
执行中位数为3的相同算法以避免在预先排序的向量上表现不佳;
- 制作一个平行版本。 下面是结果(为简单起见,
将合格的Data.Vector.Unboxed.Mutable导入为MV
将合格的Data.Vector.Generic.Mutable导入为GM
类型Vector = MV.IOVector Int
类型Sort = Vector - > IO()
medianofthreepartition :: Vector - > Int - > IO Int
medianofthreepartition uv li = do
p1< - MV.unsafeRead uv li
p2< - MV.unsafeRead uv $ li`div` 2
p3< - MV.unsafeRead uv 0
let p = median p1 p2 p3
GM.unstablePartition(< p)uv
vquicksort :: Sort
vquicksort uv = do $ (j> 1)(vquicksort(MV.unsafeSlice 0 j uv))
when(j> 1)时,b $ b让li = MV.length uv -1 b $ bj< - medianofthreepartition uv li
j + 1
vparquicksort :: Sort
vparquicksort uv = do
让li = MV.length uv-1
j< - 三分配的中值uv li
t1 - tryfork(j> 1)(vparquicksort(MV.unsafeSlice 0 j uv))
t2 < - tryfork(j + 1 等待t1
等待t2
tryfork :: Bool - > IO() - > IO(Maybe(MVar()))
tryfork False _ = return Nothing
tryfork True action = do
done< - newEmptyMVar :: IO(MVar())
_ < - forkFinally action(\_ - > putMVar done())
return $刚刚完成
wait :: Maybe(MVar()) - > IO()
等待Nothing =返回()
等待(完成)= swapMVar完成()
中值:: Int - > Int - > Int - >诠释
中位数a b c
| a> b =
如果b> c然后b
否则如果a> c然后c
else a
|否则=
如果a> c然后a
否则如果b> c然后c
else b
对于具有1,000,000个元素的矢量,我得到以下结果:
线程数量:4
****并行****
测试按长度排序:1000000
创建矢量
打印矢量
对随机矢量进行排序
CPU时间:12.30 s
对有序向量进行排序
CPU时间:9.44 s
****单线程****
按长度排序的测试:1000000
创建矢量
打印矢量
对随机矢量进行排序
CPU时间:0.27 s
排序矢量
CPU时间:0.39 s
我的问题是:
更好的办法是使用 Control.Parallel.S trategies
来并行化quicksort。采用这种方法,您不会为每个可以并行执行的代码创建昂贵的线程。你也可以创建一个纯粹的计算,而不是IO。
http://www.haskell.org/ghc/docs/latest/html/users_guide /using-concurrent.html
举个例子,看看Jim Apple编写的这个简单的快速列表:
导入Data.HashTable为H
导入Data.Array.IO
导入Control.Parallel.Strategies
导入Control.Monad
import System
exch air =
do tmpi< - readArray ai
tmpr< - readArray ar
writeArray ai tmpr
writeArray ai tmpi
bool abc = if c then a else b
quicksort arr lr =
if r <= l then return()else do
i-loop(1-1)r =<< readArray arr r
exch arr ir
withStrategy rpar $ quicksort arr l(i-1)
quicksort arr(i + 1)r
其中
loop ijv = do $()()()()()()()()() -1))
if(i'< j')then exch arr i'j'>>循环我'j'v
else返回我'
找到p f i =如果我== l然后返回我
else bool(返回i)(找到p f(f i))。 p =<< readArray arr i
$ b main =
do [testSize]< - fmap(fmap read)getArgs
arr< - testPar testSize
ans< - readArray arr (testSize`div` 2)
print ans
testPar testSize =
do x < - testArray testSize
quicksort x 0(testSize - 1)
返回x
testArray :: Int - > IO(IOArray Int Double)
testArray testSize =
do ans< - newListArray(0,testSize-1)[fromIntegral $ H.hashString $ show i | i< - [1..testSize]]
return ans
After reading Stack Overflow question Using vectors for performance improvement in Haskell describing a fast in-place quicksort in Haskell, I set myself two goals:
Implementing the same algorithm with a median of three to avoid bad performances on pre-sorted vectors;
Making a parallel version.
Here is the result (some minor pieces have been left for simplicity):
import qualified Data.Vector.Unboxed.Mutable as MV
import qualified Data.Vector.Generic.Mutable as GM
type Vector = MV.IOVector Int
type Sort = Vector -> IO ()
medianofthreepartition :: Vector -> Int -> IO Int
medianofthreepartition uv li = do
p1 <- MV.unsafeRead uv li
p2 <- MV.unsafeRead uv $ li `div` 2
p3 <- MV.unsafeRead uv 0
let p = median p1 p2 p3
GM.unstablePartition (< p) uv
vquicksort :: Sort
vquicksort uv = do
let li = MV.length uv - 1
j <- medianofthreepartition uv li
when (j > 1) (vquicksort (MV.unsafeSlice 0 j uv))
when (j + 1 < li) (vquicksort (MV.unsafeSlice (j+1) (li-j) uv))
vparquicksort :: Sort
vparquicksort uv = do
let li = MV.length uv - 1
j <- medianofthreepartition uv li
t1 <- tryfork (j > 1) (vparquicksort (MV.unsafeSlice 0 j uv))
t2 <- tryfork (j + 1 < li) (vparquicksort (MV.unsafeSlice (j+1) (li-j) uv))
wait t1
wait t2
tryfork :: Bool -> IO () -> IO (Maybe (MVar ()))
tryfork False _ = return Nothing
tryfork True action = do
done <- newEmptyMVar :: IO (MVar ())
_ <- forkFinally action (\_ -> putMVar done ())
return $ Just done
wait :: Maybe (MVar ()) -> IO ()
wait Nothing = return ()
wait (Just done) = swapMVar done ()
median :: Int -> Int -> Int -> Int
median a b c
| a > b =
if b > c then b
else if a > c then c
else a
| otherwise =
if a > c then a
else if b > c then c
else b
For vectors with 1,000,000 elements, I get the following results:
"Number of threads: 4"
"**** Parallel ****"
"Testing sort with length: 1000000"
"Creating vector"
"Printing vector"
"Sorting random vector"
CPU time: 12.30 s
"Sorting ordered vector"
CPU time: 9.44 s
"**** Single thread ****"
"Testing sort with length: 1000000"
"Creating vector"
"Printing vector"
"Sorting random vector"
CPU time: 0.27 s
"Sorting ordered vector"
CPU time: 0.39 s
My questions are:
- Why are performances still decreasing with a pre-sorted vector?
- Why does using forkIO and four thread fails to improve performances?
A better idea is to use Control.Parallel.Strategies
to parallelize quicksort. With this approach you will not create expensive threads for every code that can be executed in parallel. You can also create a pure computation instead an IO.
Then you have to compile according to the number of cores you have: http://www.haskell.org/ghc/docs/latest/html/users_guide/using-concurrent.html
For an example, look at this simple quicksort on lists, written by Jim Apple:
import Data.HashTable as H
import Data.Array.IO
import Control.Parallel.Strategies
import Control.Monad
import System
exch a i r =
do tmpi <- readArray a i
tmpr <- readArray a r
writeArray a i tmpr
writeArray a i tmpi
bool a b c = if c then a else b
quicksort arr l r =
if r <= l then return () else do
i <- loop (l-1) r =<< readArray arr r
exch arr i r
withStrategy rpar $ quicksort arr l (i-1)
quicksort arr (i+1) r
where
loop i j v = do
(i', j') <- liftM2 (,) (find (>=v) (+1) (i+1)) (find (<=v) (subtract 1) (j-1))
if (i' < j') then exch arr i' j' >> loop i' j' v
else return i'
find p f i = if i == l then return i
else bool (return i) (find p f (f i)) . p =<< readArray arr i
main =
do [testSize] <- fmap (fmap read) getArgs
arr <- testPar testSize
ans <- readArray arr (testSize `div` 2)
print ans
testPar testSize =
do x <- testArray testSize
quicksort x 0 (testSize - 1)
return x
testArray :: Int -> IO (IOArray Int Double)
testArray testSize =
do ans <- newListArray (0,testSize-1) [fromIntegral $ H.hashString $ show i | i <- [1..testSize]]
return ans
这篇关于在Haskell中快速排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!