使用向量的风格与性能 [英] Style vs Performance Using Vectors
问题描述
{ - #LANGUAGE FlexibleContexts# - }
导入数据。 Int
将合格的Data.Vector.Unboxed导入为U
将合格的Data.Vector.Generic导入为V
{ - #NOINLINE f# - } - 请注意'NO'
--f ::(Num r,V.Vector vr)=> v r - > v r - > v r
--f ::(V.Vector v Int64)=> v Int64 - > v Int64 - > v Int64
--f ::(U.Unbox r,Num r)=> U.Vector r - > U.Vector r - > U.Vector r
f :: U.Vector Int64 - > U.Vector Int64 - > U.Vector Int64
f = V.zipWith(+) - 或U.zipWith,它没有区别
main = do
let iters = 100
dim = 221184
y = U.replicate dim 0 :: U.Vector Int64
let ans = iterate((fy))y !! iters
putStr $(show $ U.sum ans)
我用 ghc 7.6.2
和 -O2
,它需要1.7秒才能运行。
我尝试了几个不同版本的 f
:
-
fx = U.zipWith(+)x
-
fx =( U.zipWith(+)x)。
<
fxy = U.zipWith(+)xy
版本1与原始版本相同,而版本2和版本3在0.09秒内运行(<
INLINING
我也注意到,如果我使
f
/ code>多态(具有上述三个签名中的任何一个),即使采用快速定义(即2或3),它也会慢下来...至1.7秒。这让我想知道原始问题是否可能归因于(缺少)类型推断,即使我明确地给出了Vector类型和元素类型的类型。c> f
我也有兴趣添加整数模数
q
:
newtype Zq qi = Zq {unZq :: i}
当添加
Int64
s,如果我写一个指定类型的函数,
h :: U.Vector(Zq Q17 Int64) - > U.Vector(Zq Q17 Int64) - > U.Vector(Zq Q17 Int64)
多态性
h ::(模数q)=> U.Vector(Zq q Int64) - > U.Vector(Zq q Int64) - > U.Vector(Zq q Int64)
但我应该至少是能够去除特定的幻像类型!它应该被编译出来,因为我正在处理
newtype
。
以下是我的问题: p>
- 放缓来自何处?
- <版本2和版本3发生了什么? c $ c> f 会以任何方式影响性能?这似乎是一个错误,(编码风格)会影响这样的性能。在矢量之外还有其他一些部分应用函数或其他文体选择的例子会影响性能吗?
- 为什么多态性使我减慢了一个数量级,与多态性的位置无关(即在矢量类型中,
Num
类型,都是幻影类型)?我知道多态使代码变慢,但这很荒谬。是否有破解?
编辑1
我使用Vector库页面提交了问题。我发现有关此问题的 GHC
问题。
EDIT2
我从@ kqr的答案中获得了一些见解后重写了这个问题。
以下是原始参考。
--------------原始问题--------------------
以下是代码:
{ - #LANGUAGE FlexibleContexts# - }
导入Control.DeepSeq
导入Data.Int
导入限定的Data.Vector .Unboxed as U
import qualified data.Vector.Generic as V
{ - #NOINLINE f# - } - 请注意'NO'
--f ::( Num r,V.Vector vr)=> v r - > v r - > v r
--f ::(V.Vector v Int64)=> v Int64 - > v Int64 - > v Int64
--f ::(U.Unbox r,Num r)=> U.Vector r - > U.Vector r - > U.Vector r
f :: U.Vector Int64 - > U.Vector Int64 - > U.Vector Int64
f = V.zipWith(+)
main = do
let iters = 100
dim = 221184
y = U.replicate dim 0 :: U.Vector Int64
let ans = iterate((fy))y !! iters
putStr $(show $ U.sum ans)
我用
ghc 7.6.2
和-O2
,它需要1.7秒才能运行。
我尝试了几个不同版本的
f
:
-
fx = U.zipWith(+)x
-
fx =( U.zipWith(+)x)。 U.force
-
f x =(U.zipWith(+)x)。 Control.DeepSeq.force)
-
f x =(U.zipWith(+)x)。 (\ z - > z`seq` z)
-
f x =(U.zipWith(+)x)
评估
fxy = U.zipWith(+)xy
版本1与原始版本相同,版本2运行时间为0.111秒,版本3-6运行时间为0.09秒(
INLINING
f
不会改变任何东西)。
减速似乎是由于懒惰,因为
force
有所帮助,但我不确定懒惰是从哪里来的。
我试着编写一个严格版本的
$(iterate
,思考向量本身必须是懒惰的:
{ - #INLINE iterate'# - }
iterate'::(NFData a)=> (a - > a) - > a - > [a]
迭代'fx = x`seq` x:iterate'f(fx)
<但是使用免费版的f
,这完全没有帮助。
我也注意到其他的东西,这可能只是一个巧合和红鲱鱼:
如果我使用f
多态(带有上述三个签名中的任何一个),即使使用快速的定义,它减缓了......精确到1.7秒。这让我怀疑原始问题是否可能是由于(缺少)类型推断,即使所有的东西都应该被很好地推断出来。
这是我的问题:
- 从哪里来的放缓?
- 为什么要用
force
help,但是使用严格的
iterate
不? - 为什么
U.force
比DeepSeq.force
差?我不知道U.force
应该做什么,但听起来很像DeepSeq.force
,并且似乎也有类似的效果。
- 为什么多态性使我减慢了一个数量级,与多态性的位置无关(即在向量类型中,在
Num
类型或两者)? - 为什么版本5和版本6都没有任何严格意义与一个严格的函数一样快?
正如@kqr指出的那样,问题似乎并不严格。因此,我写这个函数的方式会导致使用通用的
zipWith
,而不是Unboxed特定的版本。这只是GHC和Vector库之间的一种侥幸,还是有什么更一般的东西可以在这里说出来?
解决方案while我没有你想要的明确答案,有两件事可能会帮助你。
第一件事是
x`seq `x
在语义和计算上都与x
相同。维基有关seq
:
关于
seq
是seq x x
。好吧,有点。seq
不会仅仅依靠源文件中的现有值来评估任何内容,它所做的只是在另一个值上引入一个虚拟数据依赖项:当seq
被评估,第一个参数也必须(有点;见下文)被评估。
作为一个例子,假设
x :: Integer
,则seq xb
的行为基本上类似于,如果x == 0,则b否则b
- 无条件等于b
,但强制x
。特别是,x`seq` x
这个表达式是完全多余的,并且总是和写x
。
第一段说的是写
seq ab
doesn'这意味着a
会立即得到评估,这意味着a
会在<$ c需要评估$ c> b 。这可能会在程序的后面发生,也可能永远不会发生。当你从这个角度来看时,显然seq xx
是一种冗余,因为它所说的只是evaluatex
只要x
需要评估。当然,如果你刚刚写过x
,那么当然会发生什么。
这对你有两个影响:
$ b-
您的strict迭代'
函数isn'如果没有seq
,它确实比它更严格。事实上,我很难想象iterate 函数如何变得比现在更严格。你不能严格限制列表的尾部,因为它是无限的。你可以做的主要事情是强制累加器,fx
,但这样做并不会给我的系统带来任何显着的性能提升。[1]
从头开始。你严格的
迭代'
与我的爆炸模式版本完全一样。阅读评论
不给你一个严格的身份识别功能,我认为这是你要去的。事实上,普通的身份识别功能与您所得到的一样严格 - 它会在需要时立即评估其结果。
然而,我偷看了核心GHC为 $ b
U.zipWith(+)y
和
U.zipWith(+)y。 id
只有一个很大的区别是我的未经训练的眼睛可以发现。第一个使用简单的 Data.Vector.Generic.zipWith
(这里是你的多态性巧合可能发挥作用的地方 - 如果GHC选择了一个通用的 zipWith
它当然会表现得好像代码是多态的一样!),而后者已经将这种单一函数调用分解为几乎90行的状态monad代码和未打包的机器类型。
状态monad代码看起来几乎就像循环和破坏性的更新,你会用命令式语言编写,所以我认为它适合它所运行的机器。如果我不那么着急,我会花更长的眼光来看看它是如何工作的,以及为什么GHC突然决定需要一个紧密的循环。我已经将生成的核心附加到了我自己身上,就像任何其他想要查看的人一样。[2]
$ b
[ 1]:沿途强制累加器:(这是你已经做的,我误解了代码!)
{ - #LANGUAGE BangPatterns# - }
iterate'f!x = x:iterate f(fx)
[2]:什么核心 U.zipWith(+)y。 id
会被翻译成。
Here's the code:
{-# LANGUAGE FlexibleContexts #-}
import Data.Int
import qualified Data.Vector.Unboxed as U
import qualified Data.Vector.Generic as V
{-# NOINLINE f #-} -- Note the 'NO'
--f :: (Num r, V.Vector v r) => v r -> v r -> v r
--f :: (V.Vector v Int64) => v Int64 -> v Int64 -> v Int64
--f :: (U.Unbox r, Num r) => U.Vector r -> U.Vector r -> U.Vector r
f :: U.Vector Int64 -> U.Vector Int64 -> U.Vector Int64
f = V.zipWith (+) -- or U.zipWith, it doesn't make a difference
main = do
let iters = 100
dim = 221184
y = U.replicate dim 0 :: U.Vector Int64
let ans = iterate ((f y)) y !! iters
putStr $ (show $ U.sum ans)
I compiled with ghc 7.6.2
and -O2
, and it took 1.7 seconds to run.
I tried several different versions of f
:
f x = U.zipWith (+) x
f x = (U.zipWith (+) x) . id
f x y = U.zipWith (+) x y
Version 1 is the same as the original while versions 2 and 3 run in in under 0.09 seconds (and INLINING
f
doesn't change anything).
I also noticed that if I make f
polymorphic (with any of the three signatures above), even with a "fast" definition (i.e. 2 or 3), it slows back down...to exactly 1.7 seconds. This makes me wonder if the original problem is perhaps due to (lack of) type inference, even though I'm explicitly giving the types for the Vector type and element type.
I'm also interested in adding integers modulo q
:
newtype Zq q i = Zq {unZq :: i}
As when adding Int64
s, if I write a function with every type specified,
h :: U.Vector (Zq Q17 Int64) -> U.Vector (Zq Q17 Int64) -> U.Vector (Zq Q17 Int64)
I get an order of magnitude better performance than if I leave any polymorphism
h :: (Modulus q) => U.Vector (Zq q Int64) -> U.Vector (Zq q Int64) -> U.Vector (Zq q Int64)
But I should at least be able to remove the specific phantom type! It should be compiled out, since I'm dealing with a newtype
.
Here are my questions:
- Where is the slowdown coming from?
- What is going on in versions 2 and 3 of
f
that affect performance in any way? It seems like a bug to me that (what amounts to) coding style can affect performance like this. Are there other examples outside of Vector where partially applying a function or other stylistic choices affect performance? - Why does polymorphism slow me down an order of magnitude independent of where the polymorphism is (i.e. in the vector type, in the
Num
type, both, or phantom type)? I know polymorphism makes code slower, but this is ridiculous. Is there a hack around it?
EDIT 1
I filed a issue with the Vector library page. I found a GHC issue relating to this problem.
EDIT2
I rewrote the question after gaining some insight from @kqr's answer. Below is the original for reference.
--------------ORIGINAL QUESTION--------------------
Here's the code:
{-# LANGUAGE FlexibleContexts #-}
import Control.DeepSeq
import Data.Int
import qualified Data.Vector.Unboxed as U
import qualified Data.Vector.Generic as V
{-# NOINLINE f #-} -- Note the 'NO'
--f :: (Num r, V.Vector v r) => v r -> v r -> v r
--f :: (V.Vector v Int64) => v Int64 -> v Int64 -> v Int64
--f :: (U.Unbox r, Num r) => U.Vector r -> U.Vector r -> U.Vector r
f :: U.Vector Int64 -> U.Vector Int64 -> U.Vector Int64
f = V.zipWith (+)
main = do
let iters = 100
dim = 221184
y = U.replicate dim 0 :: U.Vector Int64
let ans = iterate ((f y)) y !! iters
putStr $ (show $ U.sum ans)
I compiled with ghc 7.6.2
and -O2
, and it took 1.7 seconds to run.
I tried several different versions of f
:
f x = U.zipWith (+) x
f x = (U.zipWith (+) x) . U.force
f x = (U.zipWith (+) x) . Control.DeepSeq.force)
f x = (U.zipWith (+) x) . (\z -> z `seq` z)
f x = (U.zipWith (+) x) . id
f x y = U.zipWith (+) x y
Version 1 is the same as the original, version 2 runs in 0.111 seconds, and versions 3-6 run in in under 0.09 seconds (and INLINING
f
doesn't change anything).
So the order-of-magnitude slowdown appears to be due to laziness since force
helped, but I'm not sure where the laziness is coming from. Unboxed types aren't allowed to be lazy, right?
I tried writing a strict version of iterate
, thinking the vector itself must be lazy:
{-# INLINE iterate' #-}
iterate' :: (NFData a) => (a -> a) -> a -> [a]
iterate' f x = x `seq` x : iterate' f (f x)
but with the point-free version of f
, this didn't help at all.
I also noticed something else, which could be just a coincidence and red herring:
If I make f
polymorphic (with any of the three signatures above), even with a "fast" definition, it slows back down...to exactly 1.7 seconds. This makes me wonder if the original problem is perhaps due to (lack of) type inference, even though everything should be inferred nicely.
Here are my questions:
- Where is the slowdown coming from?
- Why does composing with
force
help, but using a strictiterate
doesn't? - Why is
U.force
worse thanDeepSeq.force
? I have no idea whatU.force
is supposed to do, but it sounds a lot likeDeepSeq.force
, and seems to have a similar effect. - Why does polymorphism slow me down an order of magnitude independent of where the polymorphism is (i.e. in the vector type, in the
Num
type, or both)? - Why are versions 5 and 6, neither of which should have any strictness implications at all, just as fast as a strict function?
As @kqr pointed out, the problem doesn't seem to be strictness. So something about the way I write the function is causing the generic zipWith
to be used rather than the Unboxed-specific version. Is this just a fluke between GHC and the Vector library, or is there something more general that can be said here?
While I don't have the definitive answer you want, there are two things that might help you along.
The first thing is that x `seq` x
is, both semantically and computationally, the same thing as just x
. The wiki says about seq
:
A common misconception regarding
seq
is thatseq x
"evaluates"x
. Well, sort of.seq
doesn't evaluate anything just by virtue of existing in the source file, all it does is introduce an artificial data dependency of one value on another: when the result ofseq
is evaluated, the first argument must also (sort of; see below) be evaluated.As an example, suppose
x :: Integer
, thenseq x b
behaves essentially likeif x == 0 then b else b
– unconditionally equal tob
, but forcingx
along the way. In particular, the expressionx `seq` x
is completely redundant, and always has exactly the same effect as just writingx
.
What the first paragraph says is that writing seq a b
doesn't mean that a
will magically get evaluated this instant, it means that a
will get evaluated as soon as b
needs to be evaluated. This might occur later in the program, or maybe never at all. When you view it in that light, it is obvious that seq x x
is a redundancy, because all it says is, "evaluate x
as soon as x
needs to be evaluated." Which of course is what would happen anyway if you had just written x
.
This has two implications for you:
Your "strict"iterate'
function isn't really any stricter than it would be without theseq
. In fact, I have a hard time imagining how theiterate
function could become any stricter than it already is. You can't make the tail of the list strict, because it is infinite. The main thing you can do is force the "accumulator",f x
, but doing so doesn't give any significant performance increase on my system.[1]Scratch that. Your strict
iterate'
does exactly the same thing as my bang pattern version. See the comments.Writing
(\z -> z `seq` z)
does not give you a strict identity function, which I assume is what you were going for. In fact, the common identity function is as strict as you'll get – it will evaluate its result as soon as it is needed.
However, I peeked at the core GHC generates for
U.zipWith (+) y
and
U.zipWith (+) y . id
and there is only one big difference that my untrained eye can spot. The first one uses just a plain Data.Vector.Generic.zipWith
(here's where your polymorphism coincidence might come into play – if GHC chooses a generic zipWith
it will of course perform as if the code was polymorphic!) while the latter has exploded this single function call into almost 90 lines of state monad code and unpacked machine types.
The state monad code looks almost like the loops and destructive updates you would write in an imperative language, so I assume it's tailored pretty well to the machine it's running on. If I wasn't in such a hurry, I would take a longer look to see more exactly how it works and why GHC suddenly decided it needed a tight loop. I have attached the generated core as much for myself as anyone else who wants to take a look.[2]
[1]: Forcing the accumulator along the way: (This is what you already do, I misunderstood the code!)
{-# LANGUAGE BangPatterns #-}
iterate' f !x = x : iterate f (f x)
[2]: What core U.zipWith (+) y . id
gets translated into.
这篇关于使用向量的风格与性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!