为什么盒装矢量这么慢? [英] Why are boxed vectors so slow?

查看:122
本文介绍了为什么盒装矢量这么慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图获得分期向量的O(n)时间级联。这似乎是工作,但如果我需要存储盒装值(如向量)结果仍然很慢。

  import合格的Data.Vector作为V 
导入合格的Data.Vector.Generic.Mutable作为GM
导入Data.Vector(Vector)
导入Control.Monad.State.Strict
导入控制.Monad.ST

data App = S!(Vector Int)!Int推导(显示)

main = do
x< - liftM(地图读取。字)getContents
print $ execState(mapM_(add.V.singleton)x)(S V.empty 0)

add :: Vector Int - >状态应用程序()
add v1 = do
S v2 n < - get
let v3 = vectorGrowAdd v1 v2 n
put(S v3(​​n + V.length v1) )

vectorGrowAdd v1 v2 n = runST $ do
m1 < - V.unsafeThaw v1
m2< - V.unsafeThaw v2
m3< - if GM.length m2> (GM.length m1 + n)
然后做
返回m2
else do
GM.unsafeGrow m2(GM.length m1 + 2 *(GM.length m2))
let copyTo = GM.unsafeSlice n(GM.length m1)m3
GM.unsafeCopy copyTo m1
V.freeze m3

在这个例子中, testVals 是一个包含100000个整数的文本文件, Boxed.hs 是上面的代码, Unboxed.hs Boxed.hs 相同,除了导入 Data.Vector.Unboxed Data.Vector 的实例。

 > ghc -v 
格拉斯哥Haskell编译器,版本7.0.3
> ghc --make -O2 Boxed.hs
> time(cat testVals | ./Boxed.hs)
...
real 1m39.855s
user 1m39.282s
sys 0m0.252s
> ghc --make -O2 Unboxed.hs
> time(cat testVals | ./Unboxed.hs)
...
real 0m4.372s
user 0m4.268s
sys 0m0.088s

我的问题是:为什么Unboxed和Boxed之间有这么大的差别?如果我需要存储无法取消装箱的值,有什么方法可以提高速度?

我不确定为什么它对盒装 Vector s有如此巨大的影响,但是你在

$中浪费了很多时间 b
$ b

  V.freeze m3 

每次创建一个 m3 的副本。所以你要复制100,000 Vector s的长度。你不再需要旧的,所以它们被垃圾收集。盒装 Vector s的垃圾收集比收集未装箱的 Vector s要长得多,因为必须遵循所有指针才能看到Pointe是否也可以收集。不过,我对它有多大的差异感到有点惊讶。

几个数据:

  $ cat ./testVals | ./OldBoxed + RTS -t> Bxd.txt 
802M使用中,0.00 INIT(0.00经过),36.97 MUT 37.01经过),52.60 GC(52.67经过):ghc>>
$ cat ./testVals | ./OldUnboxed + RTS -t> UBxd.txt
81M使用中,0.00 INIT(0.00经过),9.14 MUT 9.16逝去),0.60GC(经过0.60):ghc>

所以你看到巨大的差异是由于GC,althogh MUT )
现在,如果我们用 unsafeFreeze freeze $ c>,我们得到

  $ cat ./testVals | ./Boxed + RTS -t> Bxd.txt 
39M使用中,0.00 INIT(0.00过去),0.53 MUT经过0.53秒),0.29GC(经过0.29秒):ghc>>
$ cat ./testVals | ./Unboxed + RTS -t> UBxd.txt
7M使用中,0.00 INIT(0.00经过),0.61 MUT 0.62经过),0.04GC(0.04经过):ghc>

显示的差异要小得多。事实上,这里盒装的 Vector 需要比unboxed更少的mutator时间。虽然GC的时间仍然高得多,但整体拆箱仍然更快,但是在0.66s和0.82s之间,这没什么可观的。

I am trying to get amortized O(n) time concatenation of vectors. It seems to be working but if I need to store boxed values (such as vectors) the result is still very slow.

import qualified Data.Vector as V
import qualified Data.Vector.Generic.Mutable as GM
import Data.Vector(Vector)
import Control.Monad.State.Strict
import Control.Monad.ST

data App = S !(Vector Int) !Int deriving (Show)

main = do
  x <- liftM (map read . words) getContents
  print $ execState (mapM_ (add . V.singleton) x) (S V.empty 0)

add :: Vector Int -> State App ()
add v1 = do
    S v2 n <- get
    let v3 = vectorGrowAdd v1 v2 n
    put (S v3 (n + V.length v1))

vectorGrowAdd v1 v2 n = runST $ do
  m1 <- V.unsafeThaw v1
  m2 <- V.unsafeThaw v2
  m3 <- if GM.length m2 > (GM.length m1 + n)
         then do
           return m2
         else do
           GM.unsafeGrow m2 (GM.length m1 + 2*(GM.length m2))
  let copyTo = GM.unsafeSlice n (GM.length m1) m3 
  GM.unsafeCopy copyTo m1
  V.freeze m3

In this example, testVals is a text file with 100000 integers, Boxed.hs is the code above and Unboxed.hs is the same as Boxed.hs except for importing Data.Vector.Unboxed instaid of Data.Vector.

> ghc -v
Glasgow Haskell Compiler, Version 7.0.3
> ghc --make -O2 Boxed.hs
> time (cat testVals | ./Boxed.hs)
  ...
  real      1m39.855s
  user      1m39.282s 
  sys       0m0.252s
> ghc --make -O2 Unboxed.hs
> time (cat testVals | ./Unboxed.hs)
...
real        0m4.372s
user        0m4.268s
sys         0m0.088s

My question is: Why is there such a drastic difference between Unboxed and Boxed? Is there something I can do to improved the speed if I need to store values that can't be unboxed?

解决方案

I'm not sure why it has such a dramatic impact on boxed Vectors, but you're wasting a lot of time in

V.freeze m3

That creates a copy of m3 each time. So you're copying 100,000 Vectors of increasing length. You don't need the old ones anymore, so they're garbage-collected. Garbage collection of boxed Vectors takes much longer than collection of unboxed Vectors because all pointers have to be followed to see whether the pointees can be collected too. I'm a bit surprised by how much difference it makes, though.

A few stats:

$ cat ./testVals | ./OldBoxed +RTS -t > Bxd.txt
<<ghc: 72590744976 bytes, 79999 GCs, 5696847/15655472 avg/max bytes residency (16 samples),
802M in use, 0.00 INIT (0.00 elapsed), 36.97 MUT (37.01 elapsed), 52.60 GC (52.67 elapsed) :ghc>>
$ cat ./testVals | ./OldUnboxed +RTS -t > UBxd.txt
<<ghc: 72518297568 bytes, 78256 GCs, 1013955/2294848 avg/max bytes residency (63 samples),
81M in use, 0.00 INIT (0.00 elapsed), 9.14 MUT (9.16 elapsed), 0.60 GC (0.60 elapsed) :ghc>>

So you see that the enormous difference is due to GC, althogh MUT (the time your programme does actual work) is far lower for unboxed, too.
Now, if we replace the offending freeze by unsafeFreeze, we get

$ cat ./testVals | ./Boxed +RTS -t > Bxd.txt
<<ghc: 1149880088 bytes, 2214 GCs, 5236803/17102432 avg/max bytes residency (11 samples),
39M in use, 0.00 INIT (0.00 elapsed), 0.53 MUT (0.53 elapsed), 0.29 GC (0.29 elapsed) :ghc>>
$ cat ./testVals | ./Unboxed +RTS -t > UBxd.txt
<<ghc: 1152277200 bytes, 2229 GCs, 767477/2267200 avg/max bytes residency (31 samples),
7M in use, 0.00 INIT (0.00 elapsed), 0.61 MUT (0.62 elapsed), 0.04 GC (0.04 elapsed) :ghc>>

which exposes a far smaller difference. In fact, here the boxed Vector needed less mutator time than unboxed. The GC time is still much higher, though, so overall unboxed still is faster, but at 0.66s vs 0.82s, it's nothing dramatic.

这篇关于为什么盒装矢量这么慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆