采用向量在Haskell性能改进 [英] Using vectors for performance improvement in Haskell

查看:125
本文介绍了采用向量在Haskell性能改进的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是很新的哈斯克尔,和我有什么性能改进可以通过使用不纯(可变)数据结构进行了质询。我试图拼凑出我听到了一些不同的东西,所以请原谅我,如果我的术语是不完全正确的,或者有一些小错误。

I'm very new to Haskell, and I have a question about what performance improvements can be had by using impure (mutable) data structures. I'm trying to piece together a few different things I've heard, so please bear with me if my terminology is not entirely correct, or if there are some small errors.

为了使这个具体的,考虑到快速排序算法(从哈斯克尔维基拍摄)。

To make this concrete, consider the quicksort algorithm (taken from the Haskell wiki).

quicksort :: Ord a => [a] -> [a]
quicksort []     = []
quicksort (p:xs) = (quicksort lesser) ++ [p] ++ (quicksort greater)
    where
        lesser  = filter (< p) xs
        greater = filter (>= p) xs

这是不是真正的快速排序。阿真快速排序算法是就地,这是没有的。这是非常低效的内存

This is not "true quicksort." A "true" quicksort algorithm is in-place, and this is not. This is very memory inefficient.

在另一方面,也可以使用载体在Haskell实现就地快速排序。一个例子在这个计算器的答案。

On the other hand, it is possible to use vectors in Haskell to implement an in-place quicksort. An example is given in this stackoverflow answer.

快多少是第二算法比第一?大O符号并不能帮助这里,因为性能的提升将是从更有效地使用内存,没有更好的算法(右?)。我累了构建一些测试情况下我自己,但我不得不跑难以得到的东西。

How much faster is the second algorithm than the first? Big O notation doesn't help here, because the performance improvement is going to be from using memory more efficiently, not having a better algorithm (right?). I tired to construct some test cases on my own, but I had difficult getting things running.

这是理想的答案会给一些想法是什么让就地哈斯克尔算法快理论上,而且在一些测试数据集运行时间的例子比较。

An ideal answer would give some idea of what makes the in-place Haskell algorithm faster theoretically, and an example comparison of running times on some test data set.

推荐答案

没有什么比一个测试没有好,对不对?其结果是这点毫不奇怪:随机整数的范围 [0 ..百万]

There's nothing better than a test, right? And the results are not unsurprising: for lists of random integers in range [0 .. 1000000],

list size: 200000         ghc              -O2     -fllvm  -fllvm-O2
────────                   ────────   ────────   ────────   ────────
Data.List.sort            0.878969s  0.883219s  0.878106s  0.888758s
Naïve.quicksort           0.711305s  0.870647s  0.845508s  0.919925s
UArray_IO.quicksort       9.317783s  1.919583s  9.390687s  1.945072s
Vector_Mutable.quicksort   1.48142s  0.823004s  1.526661s  0.806837s

下面, Data.List.sort 只是它是什么,Naïve.quicksort的算法,你引述, UArray_IO.quicksort Vector_Mutable.quicksort 从您链接到的问题采取:的klapaucius丹·伯顿的回答 这变成是非常不理想的性能,明智的,看看有什么更好的丹尼尔·菲舍尔能做到这一点 ,无论是包装,以接受列表(不知道如果我得到这个完全正确):

Here, Data.List.sort is just what it is, Naïve.quicksort is the algorithm you quoted, UArray_IO.quicksort and Vector_Mutable.quicksort are taken from the question you linked to: klapaucius' and Dan Burton's answer which turn out to be very suboptimal performance-wise, see what better Daniel Fischer could do it, both wrapped so as to accept lists (not sure if I got this quite right):

quicksort :: [Int] -> [Int]
quicksort l = unsafePerformIO $ do
  let bounds = (0, length l)
  arr <- newListArray bounds l :: IO (IOUArray Int Int)
  uncurry (qsort arr) bounds
  getElems arr

quicksort :: Ord a => [a] -> [a]
quicksort = toList . iqsort . fromList

分别。

正如你所看到的,天真的算法不远处与 Data.Vector 在速度方面的可变解决方案背后的排序随机生成一个整数列表,并在 IOUArray 实际上的更糟糕的。试验是在Intel酷睿i5的笔记本电脑运行Ubuntu 11.10的x86-64。

As you can see, the naïve algorithm is not far behind the mutable solution with Data.Vector in terms of speed for sorting a list of random-generated integers, and the IOUArray is actually much worse. Test was carried out on an Intel i5 laptop running Ubuntu 11.10 x86-64.


以下并没有真正多大意义考虑ɢᴏᴏᴅ可变的实现,毕竟,仍然远高于此相比,所有这些。

请注意,这并不意味着一个不错的清单为基础的方案总能跟上它的性情不定地实现等效,但是GHC肯定做了伟大的工作,以使性能接近。此外,它当然取决于数据:这些是倍时随机生成的列表来排序包含在0至1000之间,而不是0的百万如上值,即具有许多重复:

Note that this does not mean that a nice list-based program can always keep up with its mutably-implemented equivalents, but GHC sure does a great job at bringing the performance close. Also, it depends of course on the data: these are the times when the random-generated lists to sort contain values in between 0 and 1000 rather than 0 an 1000000 as above, i.e. with many duplicates:

list size: 200000         ghc               -O2      -fllvm  -fllvm-O2
────────                    ────────   ────────    ────────   ────────
Data.List.sort             0.864176s  0.882574s   0.850807s  0.857957s
Naïve.quicksort            1.475362s  1.526076s   1.475557s  1.456759s
UArray_IO.quicksort       24.405938s  5.255001s  23.561911s  5.207535s
Vector_Mutable.quicksort   3.449168s  1.125788s   3.202925s  1.117741s

不说话pre排序的阵列。

Not to speak of pre-sorted arrays.

什么是挺有意思的,(只变成具有真正的大尺寸,这需要rtsopts增加堆叠能力明显),既是可变的实施如何成为显著的的与 -fllvm -O2

What's quite interesting, (becomes only apparent with really large sizes, which require rtsopts to increase the stack capacity), is how both mutable implementations become significantly slower with -fllvm -O2:

list size: 3⋅10⁶        ghc      -O1   -fllvm-O1         -O2   -fllvm-O2
────────                    ────────    ────────    ────────    ────────
Data.List.sort            23.897897s  24.138117s  23.708218s  23.631968s
Naïve.quicksort           17.068644s  19.547817s  17.640389s  18.113622s
UArray_IO.quicksort       35.634132s  38.348955s  37.177606s  49.190503s
Vector_Mutable.quicksort  17.286982s  17.251068s  17.361247s  36.840698s

这似乎是一种逻辑,我认为不可变实现车费LLVM好(没有它做的一切不可改变地在一定程度上?),但我不明白为什么这只是变得明显,经济放缓的可变版本在高优化和大数据量。

It seems kind of logical to me that the immutable implementations fare better on llvm (doesn't it do everything immutably on some level?), though I don't understand why this only becomes apparent as a slowdown to the mutable versions at high optimisation and large data sizes.

$ cat QSortPerform.hs
module Main where

import qualified Data.List(sort)
import qualified Naïve
import qualified UArray_IO
import qualified Vector_Mutable

import Control.Monad
import System.Random
import System.Environment

sortAlgos :: [ (String, [Int]->[Int]) ]
sortAlgos = [ ("Data.List.sort", Data.List.sort)
            , ("Naïve.quicksort", Naïve.quicksort)
            , ("UArray_IO.quicksort", UArray_IO.quicksort)
            , ("Vector_Mutable.quicksort", Vector_Mutable.quicksort) ]

main = do
   args <- getArgs
   when (length args /= 2) $ error "Need 2 arguments"

   let simSize = read $ args!!1
   randArray <- fmap (take simSize . randomRs(0,1000000)) getStdGen

   let sorted = case filter ((== args!!0) . fst) sortAlgos of
        [(_, algo)] -> algo randArray
        _ -> error $ "Argument must be one of " 
                        ++ show (map fst sortAlgos)

   putStr "First element:  "; print $ sorted!!0
   putStr "Middle element: "; print $ sorted!!(simSize`div`2)
   putStr "Last element:   "; print $ sorted!!(simSize-1)

这需要算法的名称和数组大小的命令行。运行时比较采用此程序完成的:

which takes the algorithm name and array size on command-line. Runtime comparison was done with this program:

$ cat PerformCompare.hs
module Main where

import System.Process
import System.Exit
import System.Environment
import Data.Time.Clock
import Data.List
import Control.Monad
import Text.PrettyPrint.Boxes

compiler = "ghc"
testProgram = "./QSortPerform"
flagOpts = [[], ["-O2"], ["-fllvm"], ["-fllvm","-O2"]]
algos = ["Data.List.sort","Naïve.quicksort","UArray_IO.quicksort","Vector_Mutable.quicksort"]


main = do
   args <- getArgs
   let testSize = case args of
         [numb] -> read numb
         _      -> 200000

   results <- forM flagOpts $ \flags -> do

      compilerExitC <- verboseSystem
              compiler $ testProgram : "-fforce-recomp" : flags
      when (compilerExitC /= ExitSuccess) .
         error $ "Compiler error \"" ++ show compilerExitC ++"\""

      algoCompare <- forM algos $ \algo -> do
         startTime <- getCurrentTime
         exitC <- verboseSystem testProgram [algo, show testSize]
         endTime <- getCurrentTime
         when (exitC /= ExitSuccess) .
            error $ "Program error \"" ++ show exitC ++"\""
         return . text . show $ diffUTCTime endTime startTime

      return . vcat right $ text(concat flags)
                          : text("────────")
                          : algoCompare

   let table = hsep 2 bottom
         $ vcat left (map text $ ("list size: "++show testSize)
                               : "────────"
                               : algos                          )
         : results

   printBox table



verboseSystem :: String -> [String] -> IO ExitCode
verboseSystem cmd args = do
   putStrLn . unwords $ cmd : args
   rawSystem cmd args

这篇关于采用向量在Haskell性能改进的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆