采用向量在Haskell性能改进 [英] Using vectors for performance improvement in Haskell
问题描述
我是很新的哈斯克尔,和我有什么性能改进可以通过使用不纯(可变)数据结构进行了质询。我试图拼凑出我听到了一些不同的东西,所以请原谅我,如果我的术语是不完全正确的,或者有一些小错误。
I'm very new to Haskell, and I have a question about what performance improvements can be had by using impure (mutable) data structures. I'm trying to piece together a few different things I've heard, so please bear with me if my terminology is not entirely correct, or if there are some small errors.
为了使这个具体的,考虑到快速排序算法(从哈斯克尔维基拍摄)。
To make this concrete, consider the quicksort algorithm (taken from the Haskell wiki).
quicksort :: Ord a => [a] -> [a]
quicksort [] = []
quicksort (p:xs) = (quicksort lesser) ++ [p] ++ (quicksort greater)
where
lesser = filter (< p) xs
greater = filter (>= p) xs
这是不是真正的快速排序。阿真快速排序算法是就地,这是没有的。这是非常低效的内存
This is not "true quicksort." A "true" quicksort algorithm is in-place, and this is not. This is very memory inefficient.
在另一方面,也可以使用载体在Haskell实现就地快速排序。一个例子在这个计算器的答案。
On the other hand, it is possible to use vectors in Haskell to implement an in-place quicksort. An example is given in this stackoverflow answer.
快多少是第二算法比第一?大O符号并不能帮助这里,因为性能的提升将是从更有效地使用内存,没有更好的算法(右?)。我累了构建一些测试情况下我自己,但我不得不跑难以得到的东西。
How much faster is the second algorithm than the first? Big O notation doesn't help here, because the performance improvement is going to be from using memory more efficiently, not having a better algorithm (right?). I tired to construct some test cases on my own, but I had difficult getting things running.
这是理想的答案会给一些想法是什么让就地哈斯克尔算法快理论上,而且在一些测试数据集运行时间的例子比较。
An ideal answer would give some idea of what makes the in-place Haskell algorithm faster theoretically, and an example comparison of running times on some test data set.
推荐答案
没有什么比一个测试没有好,对不对?其结果是这点毫不奇怪:随机整数的范围 [0 ..百万]
,
There's nothing better than a test, right? And the results are not unsurprising: for lists of random integers in range [0 .. 1000000]
,
list size: 200000 ghc -O2 -fllvm -fllvm-O2
──────── ──────── ──────── ──────── ────────
Data.List.sort 0.878969s 0.883219s 0.878106s 0.888758s
Naïve.quicksort 0.711305s 0.870647s 0.845508s 0.919925s
UArray_IO.quicksort 9.317783s 1.919583s 9.390687s 1.945072s
Vector_Mutable.quicksort 1.48142s 0.823004s 1.526661s 0.806837s
下面, Data.List.sort
只是它是什么,Naïve.quicksort
的算法,你引述, UArray_IO.quicksort
和 Vector_Mutable.quicksort
从您链接到的问题采取:的klapaucius和丹·伯顿的回答 这变成是非常不理想的性能,明智的,看看有什么更好的丹尼尔·菲舍尔能做到这一点 的,无论是包装,以接受列表(不知道如果我得到这个完全正确):
Here, Data.List.sort
is just what it is, Naïve.quicksort
is the algorithm you quoted, UArray_IO.quicksort
and Vector_Mutable.quicksort
are taken from the question you linked to: klapaucius' and Dan Burton's answer which turn out to be very suboptimal performance-wise, see what better Daniel Fischer could do it, both wrapped so as to accept lists (not sure if I got this quite right):
quicksort :: [Int] -> [Int]
quicksort l = unsafePerformIO $ do
let bounds = (0, length l)
arr <- newListArray bounds l :: IO (IOUArray Int Int)
uncurry (qsort arr) bounds
getElems arr
和
quicksort :: Ord a => [a] -> [a]
quicksort = toList . iqsort . fromList
分别。
正如你所看到的,天真的算法不远处与 Data.Vector
在速度方面的可变解决方案背后的排序随机生成一个整数列表,并在 IOUArray
实际上的更糟糕的。试验是在Intel酷睿i5的笔记本电脑运行Ubuntu 11.10的x86-64。
As you can see, the naïve algorithm is not far behind the mutable solution with Data.Vector
in terms of speed for sorting a list of random-generated integers, and the IOUArray
is actually much worse. Test was carried out on an Intel i5 laptop running Ubuntu 11.10 x86-64.
以下并没有真正多大意义考虑ɢᴏᴏᴅ可变的实现,毕竟,仍然远高于此相比,所有这些。的
请注意,这并不意味着一个不错的清单为基础的方案总能跟上它的性情不定地实现等效,但是GHC肯定做了伟大的工作,以使性能接近。此外,它当然取决于数据:这些是倍时随机生成的列表来排序包含在0至1000之间,而不是0的百万如上值,即具有许多重复:
Note that this does not mean that a nice list-based program can always keep up with its mutably-implemented equivalents, but GHC sure does a great job at bringing the performance close. Also, it depends of course on the data: these are the times when the random-generated lists to sort contain values in between 0 and 1000 rather than 0 an 1000000 as above, i.e. with many duplicates:
list size: 200000 ghc -O2 -fllvm -fllvm-O2
──────── ──────── ──────── ──────── ────────
Data.List.sort 0.864176s 0.882574s 0.850807s 0.857957s
Naïve.quicksort 1.475362s 1.526076s 1.475557s 1.456759s
UArray_IO.quicksort 24.405938s 5.255001s 23.561911s 5.207535s
Vector_Mutable.quicksort 3.449168s 1.125788s 3.202925s 1.117741s
不说话pre排序的阵列。
Not to speak of pre-sorted arrays.
什么是挺有意思的,(只变成具有真正的大尺寸,这需要rtsopts增加堆叠能力明显),既是可变的实施如何成为显著的慢的与 -fllvm -O2
:
What's quite interesting, (becomes only apparent with really large sizes, which require rtsopts to increase the stack capacity), is how both mutable implementations become significantly slower with -fllvm -O2
:
list size: 3⋅10⁶ ghc -O1 -fllvm-O1 -O2 -fllvm-O2
──────── ──────── ──────── ──────── ────────
Data.List.sort 23.897897s 24.138117s 23.708218s 23.631968s
Naïve.quicksort 17.068644s 19.547817s 17.640389s 18.113622s
UArray_IO.quicksort 35.634132s 38.348955s 37.177606s 49.190503s
Vector_Mutable.quicksort 17.286982s 17.251068s 17.361247s 36.840698s
这似乎是一种逻辑,我认为不可变实现车费LLVM好(没有它做的一切不可改变地在一定程度上?),但我不明白为什么这只是变得明显,经济放缓的可变版本在高优化和大数据量。
It seems kind of logical to me that the immutable implementations fare better on llvm (doesn't it do everything immutably on some level?), though I don't understand why this only becomes apparent as a slowdown to the mutable versions at high optimisation and large data sizes.
$ cat QSortPerform.hs
module Main where
import qualified Data.List(sort)
import qualified Naïve
import qualified UArray_IO
import qualified Vector_Mutable
import Control.Monad
import System.Random
import System.Environment
sortAlgos :: [ (String, [Int]->[Int]) ]
sortAlgos = [ ("Data.List.sort", Data.List.sort)
, ("Naïve.quicksort", Naïve.quicksort)
, ("UArray_IO.quicksort", UArray_IO.quicksort)
, ("Vector_Mutable.quicksort", Vector_Mutable.quicksort) ]
main = do
args <- getArgs
when (length args /= 2) $ error "Need 2 arguments"
let simSize = read $ args!!1
randArray <- fmap (take simSize . randomRs(0,1000000)) getStdGen
let sorted = case filter ((== args!!0) . fst) sortAlgos of
[(_, algo)] -> algo randArray
_ -> error $ "Argument must be one of "
++ show (map fst sortAlgos)
putStr "First element: "; print $ sorted!!0
putStr "Middle element: "; print $ sorted!!(simSize`div`2)
putStr "Last element: "; print $ sorted!!(simSize-1)
这需要算法的名称和数组大小的命令行。运行时比较采用此程序完成的:
which takes the algorithm name and array size on command-line. Runtime comparison was done with this program:
$ cat PerformCompare.hs
module Main where
import System.Process
import System.Exit
import System.Environment
import Data.Time.Clock
import Data.List
import Control.Monad
import Text.PrettyPrint.Boxes
compiler = "ghc"
testProgram = "./QSortPerform"
flagOpts = [[], ["-O2"], ["-fllvm"], ["-fllvm","-O2"]]
algos = ["Data.List.sort","Naïve.quicksort","UArray_IO.quicksort","Vector_Mutable.quicksort"]
main = do
args <- getArgs
let testSize = case args of
[numb] -> read numb
_ -> 200000
results <- forM flagOpts $ \flags -> do
compilerExitC <- verboseSystem
compiler $ testProgram : "-fforce-recomp" : flags
when (compilerExitC /= ExitSuccess) .
error $ "Compiler error \"" ++ show compilerExitC ++"\""
algoCompare <- forM algos $ \algo -> do
startTime <- getCurrentTime
exitC <- verboseSystem testProgram [algo, show testSize]
endTime <- getCurrentTime
when (exitC /= ExitSuccess) .
error $ "Program error \"" ++ show exitC ++"\""
return . text . show $ diffUTCTime endTime startTime
return . vcat right $ text(concat flags)
: text("────────")
: algoCompare
let table = hsep 2 bottom
$ vcat left (map text $ ("list size: "++show testSize)
: "────────"
: algos )
: results
printBox table
verboseSystem :: String -> [String] -> IO ExitCode
verboseSystem cmd args = do
putStrLn . unwords $ cmd : args
rawSystem cmd args
这篇关于采用向量在Haskell性能改进的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!