使用向量的风格与性能 [英] Style vs Performance Using Vectors

查看:96
本文介绍了使用向量的风格与性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是代码:

  { - #LANGUAGE FlexibleContexts# - } 

导入数据。 Int
将合格的Data.Vector.Unboxed导入为U
将合格的Data.Vector.Generic导入为V

{ - #NOINLINE f# - } - 请注意'NO'
--f ::(Num r,V.Vector vr)=> v r - > v r - > v r
--f ::(V.Vector v Int64)=> v Int64 - > v Int64 - > v Int64
--f ::(U.Unbox r,Num r)=> U.Vector r - > U.Vector r - > U.Vector r
f :: U.Vector Int64 - > U.Vector Int64 - > U.Vector Int64
f = V.zipWith(+) - 或U.zipWith,它没有区别

main = do
let iters = 100
dim = 221184
y = U.replicate dim 0 :: U.Vector Int64
let ans = iterate((fy))y !! iters
putStr $(show $ U.sum ans)

我用 ghc 7.6.2 -O2 ,它需要1.7秒才能运行。



我尝试了几个不同版本的 f


  1. fx = U.zipWith(+)x

  2. fx =( U.zipWith(+)x)。

  3. fxy = U.zipWith(+)xy
  4. / ol>

    版本1与原始版本相同,而版本2和版本3在0.09秒内运行(< INLINING <

    我也注意到,如果我使 f c> f / code>多态(具有上述三个签名中的任何一个),即使采用快速定义(即2或3),它也会慢下来...至1.7秒。这让我想知道原始问题是否可能归因于(缺少)类型推断,即使我明确地给出了Vector类型和元素类型的类型。



    我也有兴趣添加整数模数 q

      newtype Zq qi = Zq {unZq :: i} 

    当添加 Int64 s,如果我写一个指定类型的函数,

      h :: U.Vector(Zq Q17 Int64) - > U.Vector(Zq Q17 Int64) - > U.Vector(Zq Q17 Int64)

    多态性

      h ::(模数q)=> U.Vector(Zq q Int64) - > U.Vector(Zq q Int64) - > U.Vector(Zq q Int64)

    但我应该至少是能够去除特定的幻像类型!它应该被编译出来,因为我正在处理 newtype



    以下是我的问题: p>


    1. 放缓来自何处?

    2. <版本2和版本3发生了什么? c $ c> f 会以任何方式影响性能?这似乎是一个错误,(编码风格)会影响这样的性能。在矢量之外还有其他一些部分应用函数或其他文体选择的例子会影响性能吗?
    3. 为什么多态性使我减慢了一个数量级,与多态性的位置无关(即在矢量类型中, Num 类型,都是幻影类型)?我知道多态使代码变慢,但这很荒谬。是否有破解?




    编辑1



    我使用Vector库页面提交了问题。我发现有关此问题的 GHC
    问题



    EDIT2



    我从@ kqr的答案中获得了一些见解后重写了这个问题。
    以下是原始参考。


    --------------原始问题--------------------



    以下是代码:

      { - #LANGUAGE FlexibleContexts# - } 

    导入Control.DeepSeq
    导入Data.Int
    导入限定的Data.Vector .Unboxed as U
    import qualified data.Vector.Generic as V

    { - #NOINLINE f# - } - 请注意'NO'
    --f ::( Num r,V.Vector vr)=> v r - > v r - > v r
    --f ::(V.Vector v Int64)=> v Int64 - > v Int64 - > v Int64
    --f ::(U.Unbox r,Num r)=> U.Vector r - > U.Vector r - > U.Vector r
    f :: U.Vector Int64 - > U.Vector Int64 - > U.Vector Int64
    f = V.zipWith(+)

    main = do
    let iters = 100
    dim = 221184
    y = U.replicate dim 0 :: U.Vector Int64
    let ans = iterate((fy))y !! iters
    putStr $(show $ U.sum ans)

    我用 ghc 7.6.2 -O2 ,它需要1.7秒才能运行。



    我尝试了几个不同版本的 f


    1. fx = U.zipWith(+)x

    2. fx =( U.zipWith(+)x)。 U.force

    3. f x =(U.zipWith(+)x)。 Control.DeepSeq.force)

    4. f x =(U.zipWith(+)x)。 (\ z - > z`seq` z)

    5. f x =(U.zipWith(+)x)

    6. fxy = U.zipWith(+)xy
    7. / ol>

      版本1与原始版本相同,版本2运行时间为0.111秒,版本3-6运行时间为0.09秒( INLINING f 不会改变任何东西)。



      减速似乎是由于懒惰,因为 force 有所帮助,但我不确定懒惰是从哪里来的。



      我试着编写一个严格版本的 iterate ,思考向量本身必须是懒惰的:

      $(

        { - #INLINE iterate'# - } 
      iterate'::(NFData a)=> (a - > a) - > a - > [a]
      迭代'fx = x`seq` x:iterate'f(fx)



      <但是使用免费版的 f ,这完全没有帮助。



      我也注意到其他的东西,这可能只是一个巧合和红鲱鱼:
      如果我使用 f 多态(带有上述三个签名中的任何一个),即使使用快速的定义,它减缓了......精确到1.7秒。这让我怀疑原始问题是否可能是由于(缺少)类型推断,即使所有的东西都应该被很好地推断出来。



      这是我的问题:


      1. 从哪里来的放缓?

      2. 为什么要用 force help,但是使用严格的 iterate 不?

      3. 为什么 U.force DeepSeq.force 差?我不知道 U.force 应该做什么,但听起来很像 DeepSeq.force ,并且似乎也有类似的效果。
      4. 为什么多态性使我减慢了一个数量级,与多态性的位置无关(即在向量类型中,在 Num 类型或两者)?

      5. 为什么版本5和版本6都没有任何严格意义与一个严格的函数一样快?

      正如@kqr指出的那样,问题似乎并不严格。因此,我写这个函数的方式会导致使用通用的 zipWith ,而不是Unboxed特定的版本。这只是GHC和Vector库之间的一种侥幸,还是有什么更一般的东西可以在这里说出来?

      解决方案

      while我没有你想要的明确答案,有两件事可能会帮助你。



      第一件事是 x`seq `x 在语义和计算上都与 x 相同。维基有关 seq


      关于 seq seq x 评估 x 。好吧,有点。 seq 不会仅仅依靠源文件中的现有值来评估任何内容,它所做的只是在另一个值上引入一个虚拟数据依赖项:当 seq 被评估,第一个参数也必须(有点;见下文)被评估。



      作为一个例子,假设 x :: Integer ,则 seq xb 的行为基本上类似于,如果x == 0,则b否则b - 无条件等于 b ,但强制 x 。特别是, x`seq` x 这个表达式是完全多余的,并且总是和写 x

      第一段说的是写 seq ab doesn'这意味着 a 会立即得到评估,这意味着 a 会在<$ c需要评估$ c> b 。这可能会在程序的后面发生,也可能永远不会发生。当你从这个角度来看时,显然 seq xx 是一种冗余,因为它所说的只是evaluate x 只要 x 需要评估。当然,如果你刚刚写过 x ,那么当然会发生什么。



      这对你有两个影响:
      $ b


      1. 您的strict迭代'函数isn'如果没有 seq ,它确实比它更严格。事实上,我很难想象 iterate 函数如何变得比现在更严格。你不能严格限制列表的尾部,因为它是无限的。你可以做的主要事情是强制累加器, fx ,但这样做并不会给我的系统带来任何显着的性能提升。[1]



        从头开始。你严格的迭代'与我的爆炸模式版本完全一样。阅读评论

      2. 不给你一个严格的身份识别功能,我认为这是你要去的。事实上,普通的身份识别功能与您所得到的一样严格 - 它会在需要时立即评估其结果。


      然而,我偷看了核心GHC为 $ b

        U.zipWith(+)y 

        U.zipWith(+)y。 id 

      只有一个很大的区别是我的未经训练的眼睛可以发现。第一个使用简单的 Data.Vector.Generic.zipWith (这里是你的多态性巧合可能发挥作用的地方 - 如果GHC选择了一个通用的 zipWith 它当然会表现得好像代码是多态的一样!),而后者已经将这种单一函数调用分解为几乎90行的状态monad代码和未打包的机器类型。



      状态monad代码看起来几乎就像循环和破坏性的更新,你会用命令式语言编写,所以我认为它适合它所运行的机器。如果我不那么着急,我会花更长的眼光来看看它是如何工作的,以及为什么GHC突然决定需要一个紧密的循环。我已经将生成的核心附加到了我自己身上,就像任何其他想要查看的人一样。[2]
      $ b




      [ 1]:沿途强制累加器:(这是你已经做的,我误解了代码!)

        { - #LANGUAGE BangPatterns# - } 
      iterate'f!x = x:iterate f(fx)

      [2]:什么核心 U.zipWith(+)y。 id 会被翻译成。


      Here's the code:

      {-# LANGUAGE FlexibleContexts #-}
      
      import Data.Int
      import qualified Data.Vector.Unboxed as U
      import qualified Data.Vector.Generic as V
      
      {-# NOINLINE f #-} -- Note the 'NO'
      --f :: (Num r, V.Vector v r) => v r -> v r -> v r
      --f :: (V.Vector v Int64) => v Int64 -> v Int64 -> v Int64
      --f :: (U.Unbox r, Num r) => U.Vector r -> U.Vector r -> U.Vector r
      f :: U.Vector Int64 -> U.Vector Int64 -> U.Vector Int64
      f = V.zipWith (+) -- or U.zipWith, it doesn't make a difference
      
      main = do
          let iters = 100
              dim = 221184
              y = U.replicate dim 0 :: U.Vector Int64
          let ans = iterate ((f y)) y !! iters
          putStr $ (show $ U.sum ans)
      

      I compiled with ghc 7.6.2 and -O2, and it took 1.7 seconds to run.

      I tried several different versions of f:

      1. f x = U.zipWith (+) x
      2. f x = (U.zipWith (+) x) . id
      3. f x y = U.zipWith (+) x y

      Version 1 is the same as the original while versions 2 and 3 run in in under 0.09 seconds (and INLINING f doesn't change anything).

      I also noticed that if I make f polymorphic (with any of the three signatures above), even with a "fast" definition (i.e. 2 or 3), it slows back down...to exactly 1.7 seconds. This makes me wonder if the original problem is perhaps due to (lack of) type inference, even though I'm explicitly giving the types for the Vector type and element type.

      I'm also interested in adding integers modulo q:

      newtype Zq q i = Zq {unZq :: i}
      

      As when adding Int64s, if I write a function with every type specified,

      h :: U.Vector (Zq Q17 Int64) -> U.Vector (Zq Q17 Int64) -> U.Vector (Zq Q17 Int64)
      

      I get an order of magnitude better performance than if I leave any polymorphism

      h :: (Modulus q) => U.Vector (Zq q Int64) -> U.Vector (Zq q Int64) -> U.Vector (Zq q Int64)
      

      But I should at least be able to remove the specific phantom type! It should be compiled out, since I'm dealing with a newtype.

      Here are my questions:

      1. Where is the slowdown coming from?
      2. What is going on in versions 2 and 3 of f that affect performance in any way? It seems like a bug to me that (what amounts to) coding style can affect performance like this. Are there other examples outside of Vector where partially applying a function or other stylistic choices affect performance?
      3. Why does polymorphism slow me down an order of magnitude independent of where the polymorphism is (i.e. in the vector type, in the Num type, both, or phantom type)? I know polymorphism makes code slower, but this is ridiculous. Is there a hack around it?

      EDIT 1

      I filed a issue with the Vector library page. I found a GHC issue relating to this problem.

      EDIT2

      I rewrote the question after gaining some insight from @kqr's answer. Below is the original for reference.

      --------------ORIGINAL QUESTION--------------------

      Here's the code:

      {-# LANGUAGE FlexibleContexts #-}
      
      import Control.DeepSeq
      import Data.Int
      import qualified Data.Vector.Unboxed as U
      import qualified Data.Vector.Generic as V
      
      {-# NOINLINE f #-} -- Note the 'NO'
      --f :: (Num r, V.Vector v r) => v r -> v r -> v r
      --f :: (V.Vector v Int64) => v Int64 -> v Int64 -> v Int64
      --f :: (U.Unbox r, Num r) => U.Vector r -> U.Vector r -> U.Vector r
      f :: U.Vector Int64 -> U.Vector Int64 -> U.Vector Int64
      f = V.zipWith (+)
      
      main = do
          let iters = 100
              dim = 221184
              y = U.replicate dim 0 :: U.Vector Int64
          let ans = iterate ((f y)) y !! iters
          putStr $ (show $ U.sum ans)
      

      I compiled with ghc 7.6.2 and -O2, and it took 1.7 seconds to run.

      I tried several different versions of f:

      1. f x = U.zipWith (+) x
      2. f x = (U.zipWith (+) x) . U.force
      3. f x = (U.zipWith (+) x) . Control.DeepSeq.force)
      4. f x = (U.zipWith (+) x) . (\z -> z `seq` z)
      5. f x = (U.zipWith (+) x) . id
      6. f x y = U.zipWith (+) x y

      Version 1 is the same as the original, version 2 runs in 0.111 seconds, and versions 3-6 run in in under 0.09 seconds (and INLINING f doesn't change anything).

      So the order-of-magnitude slowdown appears to be due to laziness since force helped, but I'm not sure where the laziness is coming from. Unboxed types aren't allowed to be lazy, right?

      I tried writing a strict version of iterate, thinking the vector itself must be lazy:

      {-# INLINE iterate' #-}
      iterate' :: (NFData a) => (a -> a) -> a -> [a]
      iterate' f x =  x `seq` x : iterate' f (f x)
      

      but with the point-free version of f, this didn't help at all.

      I also noticed something else, which could be just a coincidence and red herring: If I make f polymorphic (with any of the three signatures above), even with a "fast" definition, it slows back down...to exactly 1.7 seconds. This makes me wonder if the original problem is perhaps due to (lack of) type inference, even though everything should be inferred nicely.

      Here are my questions:

      1. Where is the slowdown coming from?
      2. Why does composing with force help, but using a strict iterate doesn't?
      3. Why is U.force worse than DeepSeq.force? I have no idea what U.force is supposed to do, but it sounds a lot like DeepSeq.force, and seems to have a similar effect.
      4. Why does polymorphism slow me down an order of magnitude independent of where the polymorphism is (i.e. in the vector type, in the Num type, or both)?
      5. Why are versions 5 and 6, neither of which should have any strictness implications at all, just as fast as a strict function?

      As @kqr pointed out, the problem doesn't seem to be strictness. So something about the way I write the function is causing the generic zipWith to be used rather than the Unboxed-specific version. Is this just a fluke between GHC and the Vector library, or is there something more general that can be said here?

      解决方案

      While I don't have the definitive answer you want, there are two things that might help you along.

      The first thing is that x `seq` x is, both semantically and computationally, the same thing as just x. The wiki says about seq:

      A common misconception regarding seq is that seq x "evaluates" x. Well, sort of. seq doesn't evaluate anything just by virtue of existing in the source file, all it does is introduce an artificial data dependency of one value on another: when the result of seq is evaluated, the first argument must also (sort of; see below) be evaluated.

      As an example, suppose x :: Integer, then seq x b behaves essentially like if x == 0 then b else b – unconditionally equal to b, but forcing x along the way. In particular, the expression x `seq` x is completely redundant, and always has exactly the same effect as just writing x.

      What the first paragraph says is that writing seq a b doesn't mean that a will magically get evaluated this instant, it means that a will get evaluated as soon as b needs to be evaluated. This might occur later in the program, or maybe never at all. When you view it in that light, it is obvious that seq x x is a redundancy, because all it says is, "evaluate x as soon as x needs to be evaluated." Which of course is what would happen anyway if you had just written x.

      This has two implications for you:

      1. Your "strict" iterate' function isn't really any stricter than it would be without the seq. In fact, I have a hard time imagining how the iterate function could become any stricter than it already is. You can't make the tail of the list strict, because it is infinite. The main thing you can do is force the "accumulator", f x, but doing so doesn't give any significant performance increase on my system.[1]

        Scratch that. Your strict iterate' does exactly the same thing as my bang pattern version. See the comments.

      2. Writing (\z -> z `seq` z) does not give you a strict identity function, which I assume is what you were going for. In fact, the common identity function is as strict as you'll get – it will evaluate its result as soon as it is needed.

      However, I peeked at the core GHC generates for

      U.zipWith (+) y
      

      and

      U.zipWith (+) y . id
      

      and there is only one big difference that my untrained eye can spot. The first one uses just a plain Data.Vector.Generic.zipWith (here's where your polymorphism coincidence might come into play – if GHC chooses a generic zipWith it will of course perform as if the code was polymorphic!) while the latter has exploded this single function call into almost 90 lines of state monad code and unpacked machine types.

      The state monad code looks almost like the loops and destructive updates you would write in an imperative language, so I assume it's tailored pretty well to the machine it's running on. If I wasn't in such a hurry, I would take a longer look to see more exactly how it works and why GHC suddenly decided it needed a tight loop. I have attached the generated core as much for myself as anyone else who wants to take a look.[2]


      [1]: Forcing the accumulator along the way: (This is what you already do, I misunderstood the code!)

      {-# LANGUAGE BangPatterns #-}
      iterate' f !x = x : iterate f (f x)
      

      [2]: What core U.zipWith (+) y . id gets translated into.

      这篇关于使用向量的风格与性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆