是否有可能在Haskell中以par的速度加快速度? [英] Is it possible to speed up a quicksort with par in Haskell?
问题描述
import System.Random
import Control.Parallel
import Data.List
quicksort :: [a] - > [a]
quicksort xs = pQuicksort 16 xs - 16是用于分类的火花数量
- pQuicksort,parallelQuicksort
- 只要n> 0并行计算列表的上部和下部,
- 当我们递归到足够深的时候,n == 0,这会变成一个串行快速排序。
pQuicksort :: Int - > [a] - > [p]
pQuicksort _ [] = []
pQuicksort 0(x:xs)=
在pQuicksort中设置(lower,upper)= partition(< x)xs
0低位++ [x] ++ pquicksort 0高位
pQuicksort n(x:xs)=
let(lower,upper)=分区(< x)xs
l = pQuicksort(n `div` 2)lower
u = [x] ++ pQuicksort(n`div` 2)upper
in(par ul)++ u
main :: IO( )
main = do
gen< - getStdGen
let randints =(take 5000000)$ randoms gen :: [Int]
putStrLn。显示 。 sum $(quicksort randints)
我用
编译
ghc --make -threaded -O2 quicksort.hs
<
./ quicksort + RTS -N16 -RTS
无论我做什么,我都无法让它比运行在一个cpu上的简单顺序实现运行得更快。
- 是否有可能解释为什么它在多个CPU上的运行速度比在一个CPU上慢得多?
- 是否有可能使此比例最少子线性,与CPU的数量通过做一些技巧?
编辑:@tempestadept暗示,快速排序它自我是问题。为了检查这一点,我以与上例相同的精神实施了一个简单的合并排序。它具有相同的行为,执行速度越慢,添加的功能越多。
import System.Random
import Control.Parallel
splitList :: [a] - > ([a],[a])
splitList = helper True [] []
helper _ left right [] =(left,right)$ b $ helper true right left(x:xs )= helper False(x:left)right xs
helper False left right(x:xs)= helper True left(x:right)xs
merge ::(Ord a)= > [a] - > [a] - > [a]
合并xs [] = xs
merge [] ys = ys
merge(x:xs)(y:ys)= case x True - > x:合并xs(y:ys)
False - > y:merge(x:xs)ys
mergeSort ::(Ord a)=> [a] - > [a]
mergeSort xs = pMergeSort 16 xs - 我们使用16个sparks
- pMergeSort,并行合并排序。需要额外的参数
- 告诉要创建多少个火花。在我们的简单测试中,它是
- 设置为16
pMergeSort ::(Ord a)=> Int - > [a] - > [a]
pMergeSort _ [] = []
pMergeSort _ [a] = [a]
pMergeSort 0 xs =
let(left,right)= splitList xs
in merge(pMergeSort 0 left)(pMergeSort 0 right)
pMergeSort n xs =
let(left,right)= splitList xs
l = pMergeSort(n`div` 2)left
r = pMergeSort(n`div` 2)右边
in(r`par` l)`pseq`(合并lr)
ris :: Int - > IO [Int]
ris n = do
gen< - getStdGen
return。 (取n)$ randoms gen
main = do
r < - ris 100000
putStrLn。显示 。 sum $ mergeSort r
我不太确定它可以用于惯用的快速排序,但它可以用于(在某种程度上较弱的情况下)真正必要的快速排序,如Roman在 Sparking Imperatives 。
尽管如此,他从未获得过很好的加速。这确实需要真正的工作 - 窃取deque ,不会像火花队列那样溢出来正确优化。
I have got this seemingly trivial parallel quicksort implementation, the code is as follows:
import System.Random
import Control.Parallel
import Data.List
quicksort :: [a] -> [a]
quicksort xs = pQuicksort 16 xs -- 16 is the number of sparks used to sort
-- pQuicksort, parallelQuicksort
-- As long as n > 0 evaluates the lower and upper part of the list in parallel,
-- when we have recursed deep enough, n==0, this turns into a serial quicksort.
pQuicksort :: Int -> [a] -> [a]
pQuicksort _ [] = []
pQuicksort 0 (x:xs) =
let (lower, upper) = partition (< x) xs
in pQuicksort 0 lower ++ [x] ++ pQuicksort 0 upper
pQuicksort n (x:xs) =
let (lower, upper) = partition (< x) xs
l = pQuicksort (n `div` 2) lower
u = [x] ++ pQuicksort (n `div` 2) upper
in (par u l) ++ u
main :: IO ()
main = do
gen <- getStdGen
let randints = (take 5000000) $ randoms gen :: [Int]
putStrLn . show . sum $ (quicksort randints)
I compile with
ghc --make -threaded -O2 quicksort.hs
and run with
./quicksort +RTS -N16 -RTS
No matter what I do I can not get this to run faster than a simple sequential implementation running on one cpu.
- Is it possible to explain why this runs so much slower on several CPUs than on one?
- Is it possible to make this scale, at least sub linearly, with the number of CPUs by doing some trick?
EDIT: @tempestadept hinted that quick sort it self is the problem. To check this I implemented a simple merge sort in the same spirit as the example above. It has the same behaviour, performs slower the more capabilities you add.
import System.Random
import Control.Parallel
splitList :: [a] -> ([a], [a])
splitList = helper True [] []
where helper _ left right [] = (left, right)
helper True left right (x:xs) = helper False (x:left) right xs
helper False left right (x:xs) = helper True left (x:right) xs
merge :: (Ord a) => [a] -> [a] -> [a]
merge xs [] = xs
merge [] ys = ys
merge (x:xs) (y:ys) = case x<y of
True -> x : merge xs (y:ys)
False -> y : merge (x:xs) ys
mergeSort :: (Ord a) => [a] -> [a]
mergeSort xs = pMergeSort 16 xs -- we use 16 sparks
-- pMergeSort, parallel merge sort. Takes an extra argument
-- telling how many sparks to create. In our simple test it is
-- set to 16
pMergeSort :: (Ord a) => Int -> [a] -> [a]
pMergeSort _ [] = []
pMergeSort _ [a] = [a]
pMergeSort 0 xs =
let (left, right) = splitList xs
in merge (pMergeSort 0 left) (pMergeSort 0 right)
pMergeSort n xs =
let (left, right) = splitList xs
l = pMergeSort (n `div` 2) left
r = pMergeSort (n `div` 2) right
in (r `par` l) `pseq` (merge l r)
ris :: Int -> IO [Int]
ris n = do
gen <- getStdGen
return . (take n) $ randoms gen
main = do
r <- ris 100000
putStrLn . show . sum $ mergeSort r
I'm not sure how well it can work for the idiomatic quicksort, but it can work (to a somewhat weak extent) for the true imperative quicksort as shown by Roman in Sparking Imperatives.
He never did get good speedup, though. This really needs a real work-stealing deque that doesn't overflow like the spark queue to optimize properly.
这篇关于是否有可能在Haskell中以par的速度加快速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!