始终保证`seq`的评估顺序(此外还有'pseq`的奇怪行为) [英] Always guaranteed evaluation order of `seq` (with strange behavior of `pseq` in addition)

查看:60
本文介绍了始终保证`seq`的评估顺序(此外还有'pseq`的奇怪行为)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

seq函数的文档中指出以下内容:

关于评估顺序的注释:表达式seq a b不能保证a将在b之前进行评估. seq给出的唯一保证是ab都将在seq返回值之前进行求值.特别是,这意味着b可以在a之前进行评估.如果需要保证评估的特定顺序,则必须使用并行"包中的函数pseq.

所以我有一个带有累加器的sum函数的惰性版本:

sum :: Num a => [a] -> a
sum = go 0
  where
    go acc []     = acc
    go acc (x:xs) = go (x + acc) xs

显然,在大型列表中,这非常慢.现在,我使用seq重写该函数:

sum :: Num a => [a] -> a
sum = go 0
  where
    go acc []     = acc
    go acc (x:xs) = let acc' = x + acc
                    in acc' `seq` go acc' xs

我看到了巨大的性能提升!但是我想知道它有多可靠吗?我是靠运气得到的吗?因为GHC可以先评估递归调用(根据文档),但仍会累积重击.看来我需要使用pseq来确保在递归调用之前始终评估acc'.但是使用pseq时,与seq版本相比,性能会下降.我机器上的数字(用于计算sum [1 .. 10^7]:

  • 天真:2.6s
  • seq:0.2s
  • pseq:0.5s

我正在使用GHC-8.2.2,并使用stack ghc -- File.hs命令进行编译.

在尝试使用stack ghc -- -O File.hs命令进行编译后,seqpseq之间的性能差距消失了.它们现在都在0.2s中运行.

那么我的实现是否显示我想要的属性?还是GHC有一些实施上的怪癖?为什么pseq变慢?是否存在一些示例,其中seq a b根据评估顺序具有不同的结果(相同的代码,但不同的编译器标志/不同的编译器/等)?

解决方案

到目前为止,答案主要集中在seqpseq性能问题上,但是我认为您最初想知道应该使用这两种方法中的哪一种

简单的答案是:尽管实际上两者都应生成性能几乎相同的代码(至少在打开适当的优化标志时),但原始的seq而不是pseq是适合您的情况的正确选择.从性能的角度来看,使用pseq是非习惯性的,令人困惑的并且可能适得其反,并且您使用它的原因是基于对评估顺序保证的含义以及对性能的隐含理解的错误理解.虽然不能保证不同编译器标志集之间的性能(对于其他编译器而言,性能差很多),但是如果遇到上述代码的seq版本比使用"production"的pseq版本运行明显慢的情况, GHC编译器的质量"优化标志,则应将其视为GHC错误并提交错误报告.

长答案当然是更长的时间...

首先,让我们清楚地知道seqpseq在语义上都相同,在满足两个方程式的前提下:

seq _|_ b = _|_
seq a b = b -- if a is not _|_
pseq _|_ b = _|_
pseq a b = b -- if a is not _|_

这实际上是它们中唯一一个在语义上保证的事情,并且由于Haskell语言的定义(如Haskell报告中所述)只能(最多)提供语义保证,并且不涉及性能或性能.实施中,没有理由在不同编译器或编译器标志之间保证性能的情况下在两者之间进行选择.

此外,在函数sum的特定基于seq的版本中,不难发现没有使用未定义的第一个参数而是使用定义的第二个参数调用seq的情况( (假设使用标准数字类型),因此您甚至没有使用 seq的语义属性.您可以将seq重新定义为seq a b = b,并且具有完全相同的语义.当然,您知道这一点,这就是为什么您的第一个版本不使用seq的原因.相反,您将seq用作附带的性能副作用,因此我们不在语义保证的范围内,而是在特定的GHC编译器实现和性能特征的范围内(实际上并没有任何保证可以说).

第二,这使我们达到了seq预期目的.它很少用于其语义属性,因为这些属性不是很有用.谁希望计算seq a b返回b 除了,如果某些不相关的表达式a无法终止,则计算终止会失败? (异常-并非双关语-类似于处理异常,您可以使用基于seqseqdeepSeq来强制对不受控制或受控的非终止表达式进行求值方式,然后再开始评估另一个表达式.)

相反,seq a b旨在在返回b的结果之前将a的评估强制为弱头法线形式,以防止积木.这个想法是,如果您有一个表达式b可以构建一个可能会堆积在以a表示的另一个未经评估的庞克之上的庞克,则可以使用seq a b来防止该积聚. 保证"是一个较弱的保证:GHC保证它了解您不希望在需要seq a b的值时,a保持未经评估的重击.从技术上讲,它并不保证a将在b之前被求值",无论这是什么意思,但是您不需要该保证.当您担心没有此保证时, GHC可能首先评估递归调用并仍然积累重击,这就像担心pseq a b可能评估其第一个参数,然后等待15分钟(只是确保绝对确定第一个参数已被评估!)一样可笑.第二.

在这种情况下,您应该信任GHC做正确的事情.在您看来,实现seq a b的性能优势的唯一方法是在开始评估b之前先将a评估为WHNF,但是可以想到,在这种情况下或其他情况下,存在一些优化措施从技术上讲,开始评估b(或什至完全评估WHNF的b),而在短时间内不对a进行评估,以提高性能,同时仍保留seq a b的语义.通过使用pseq可以防止GHC进行此类优化. (在您的sum程序情况下,毫无疑问没有这种优化,但是在更复杂的seq使用中,可能会有这种优化.)

第三,了解pseq实际上是 的重要部分.它是在 Marlow 2009 中在并发编程的背景下首次描述的.假设我们要并行化两个昂贵的计算foobar,然后合并(比如说相加)它们的结果:

foo `par` (bar `seq` foo+bar)  -- parens redundant but included for clarity

此处的意图是-当需要此表达式的值时-会产生火花并行计算foo,然后通过seq表达式开始对WHNF评估bar(即,数值),最后评估foo+bar之前,将等待foo火花,然后再添加并返回结果.

在这里,可以想象的是,GHC会识别出对于特定的数字类型,(1)如果bar这样做,foo+bar会自动失败,从而满足seq的形式语义保证; (2)对WHNF评估foo+bar将自动强制对WHNF评估bar,从而防止任何重击堆积,从而满足seq的非正式实施保证.在这种情况下,GHC可能会随意优化seq来产生:

foo `par` foo+bar

特别是如果认为在完成对WHNF的bar评估之前开始对foo+bar进行评估会更有效.

GHC还不够聪明,无法意识到-如果在foo火花被安排之前就开始对foo+bar中的foo进行评估,则火花将动摇,并且不会发生并行执行.

实际上仅在这种情况下,您需要显式延迟要求激发表达式的值,以允许在主线程追赶"之前有机会对其进行调度,从而需要pseq的额外保证.并愿意放弃seq:

的较弱保证,而允许GHC放弃其他优化机会.

foo `par` (bar `pseq` foo+bar)

在这里,pseq将阻止GHC引入任何可能使foo+bar在WHNF中进入评估之前(可能希望使foo火花变模糊)的优化(我们希望,它有足够的时间用于计划中的火花).

结果是,如果您将pseq用于并行编程之外的其他任何事情,则您使用的是错误的. (好吧,也许有些奇怪的情况,但是...)如果要做的就是使用seq(或$!seq或Haskell严格数据类型定义,用$!定义)是正确的方法.

(或者,如果可以相信@Kindaro的基准测试,也许对带有特定编译器版本和标志的无情基准测试是正确的方法.)

The documentation of seq function says the following:

A note on evaluation order: the expression seq a b does not guarantee that a will be evaluated before b. The only guarantee given by seq is that the both a and b will be evaluated before seq returns a value. In particular, this means that b may be evaluated before a. If you need to guarantee a specific order of evaluation, you must use the function pseq from the "parallel" package.

So I have a lazy version of sum function with accumulator:

sum :: Num a => [a] -> a
sum = go 0
  where
    go acc []     = acc
    go acc (x:xs) = go (x + acc) xs

Obviously, this is extremely slow on big lists. Now I'm rewriting this function using seq:

sum :: Num a => [a] -> a
sum = go 0
  where
    go acc []     = acc
    go acc (x:xs) = let acc' = x + acc
                    in acc' `seq` go acc' xs

And I see huge performance increase! But I wonder how reliable it is? Did I get it by luck? Because GHC can evaluate recursive call first (according to documentation) and still accumulate thunks. It looks like I need to use pseq to ensure that acc' is always evaluated before recursive call. But with pseq I see performance decrease in compare to seq version. Numbers on my machine (for calculating sum [1 .. 10^7]:

  • naive: 2.6s
  • seq: 0.2s
  • pseq: 0.5s

I'm using GHC-8.2.2 and I compile with stack ghc -- File.hs command.

After I tried to compile with stack ghc -- -O File.hs command performance gap between seq and pseq is gone. They now both run in 0.2s.

So does my implementation exhibit the properties I want? Or does GHC has some implementation quirk? Why is pseq slower? Does there exist some example where seq a b has different results depending on evaluation order (same code but different compiler flags/different compilers/etc.)?

解决方案

The answers so far have focused on the seq versus pseq performance issues, but I think you originally wanted to know which of the two you should use.

The short answer is: while both should generate nearly identically performing code in practice (at least when proper optimization flags are turned on), the primitive seq, and not pseq, is the correct choice for your situation. Using pseq is non-idiomatic, confusing, and potentially counterproductive from a performance standpoint, and your reason for using it is based on a flawed understand of what its order-of-evaluation guarantee means and what it implies with respect to performance. While there are no guarantees about performance across different sets of compiler flags (much less across other compilers), if you ever run into a situation where the seq version of the above code runs significantly slower than the pseq version using "production quality" optimization flags with the GHC compiler, you should consider it a GHC bug and file a bug report.

The long answer is, of course, longer...

First, let's be clear that seq and pseq are semantically identical in the sense that they both satisfy the equations:

seq _|_ b = _|_
seq a b = b -- if a is not _|_
pseq _|_ b = _|_
pseq a b = b -- if a is not _|_

This is really the only thing that either of them guarantees semantically, and since the definition of the Haskell language (as given in the Haskell report say) only makes -- at best -- semantic guarantees and does not deal with performance or implementation, there's no reason to choose between one or the other for reasons of guaranteed performance across different compilers or compiler flags.

Furthermore, in your particular seq-based version of the function sum, it's not too difficult to see that there is no situation in which seq is called with an undefined first argument but a defined second argument (assuming the use of a standard numeric type), so you aren't even using the semantic properties of seq. You could re-define seq as seq a b = b and have exactly the same semantics. Of course, you know this -- that's why your first version didn't use seq. Instead, you're using seq for an incidental performance side-effect, so we're out of the realm of semantic guarantees and back in the realm of specific GHC compiler implementation and performance characteristics (where there aren't really any guarantees to speak of).

Second, that brings us to the intended purpose of seq. It is rarely used for its semantic properties because those properties aren't very useful. Who would want a computation seq a b to return b except that it should fail to terminate if some unrelated expression a fails to terminate? (The exceptions -- no pun intended -- would be things like handling exceptions, where you might use seq or deepSeq which is based on seq to force evaluation of a non-terminating expression in either an uncontrolled or controlled way, before starting evaluation of another expression.)

Instead, seq a b is intended to force evaluation of a to weak head normal form before returning the result of b to prevent accumulation of thunks. The idea is, if you have an expression b which builds a thunk that could potentially accumulate on top of another unevaluated thunk represented by a, you can prevent that accumulation by using seq a b. The "guarantee" is a weak one: GHC guarantees that it understands you don't want a to remain an unevaluated thunk when seq a b's value is demanded. Technically, it doesn't guarantee that a will be "evaluated before" b, whatever that means, but you don't need that guarantee. When you worry that, without this guarantee, GHC might evaluate the recursive call first and still accumulate thunks, this is as ridiculous as worrying that pseq a b might evaluate its first argument, then wait 15 minutes (just to make absolutely sure the first argument has been evaluated!), before evaluating its second.

This is a situation where you should trust GHC to do the right thing. It may seem to you that the only way to realize the performance benefit of seq a b is for a to be evaluated to WHNF before evaluation of b starts, but it is conceivable that there are optimizations in this or other situations that technically start evaluating b (or even fully evaluate b to WHNF) while leaving a unevaluated for a short time to improve performance while still preserving the semantics of seq a b. By using pseq instead, you may prevent GHC from making such optimizations. (In your sum program situation, there undoubtedly is no such optimization, but in a more complex use of seq, there might be.)

Third, it's important to understand what pseq is actually for. It was first described in Marlow 2009 in the context of concurrent programming. Suppose we want to parallelize two expensive computations foo and bar and then combine (say, add) their results:

foo `par` (bar `seq` foo+bar)  -- parens redundant but included for clarity

The intention here is that -- when this expression's value is demanded -- it creates a spark to compute foo in parallel and then, via the seq expression, starts evaluating bar to WHNF (i.e., it's numeric value, say) before finally evaluating foo+bar which will wait on the spark for foo before adding and returning the results.

Here, it's conceivable that GHC will recognize that for a specific numeric type, (1) foo+bar automatically fails to terminate if bar does, satisfying the formal semantic guarantee of seq; and (2) evaluating foo+bar to WHNF will automatically force evaluation of bar to WHNF preventing any thunk accumulation and so satisfying the informal implementation guarantee of seq. In this situation, GHC may feel free to optimize the seq away to yield:

foo `par` foo+bar

particularly if it feels that it would be more performant to start evaluation of foo+bar before finishing evaluating bar to WHNF.

What GHC isn't smart enough to realize is that -- if evaluation of foo in foo+bar starts before the foo spark is scheduled, the spark will fizzle, and no parallel execution will occur.

It's really only in this case, where you need to explicitly delay demanding the value of a sparked expression to allow an opportunity for it to be scheduled before the main thread "catches up" that you need the extra guarantee of pseq and are willing to have GHC forgo additional optimization opportunities permitted by the weaker guarantee of seq:

foo `par` (bar `pseq` foo+bar)

Here, pseq will prevent GHC from introducing any optimization that might allow foo+bar to start evaluating (potentially fizzling the foo spark) before bar is in WHNF (which, we hope, allows enough time for the spark to be scheduled).

The upshot is that, if you're using pseq for anything other than concurrent programming, you're using it wrong. (Well, maybe there are some weird situations, but...) If all you want to do is force strict evaluation and/or thunk evaluation to improve performance in non-concurrent code, using seq (or $! which is defined in terms of seq or Haskell strict data types which are defined in terms of $!) is the correct approach.

(Or, if @Kindaro's benchmarks are to be believed, maybe merciless benchmarking with specific compiler versions and flags is the correct approach.)

这篇关于始终保证`seq`的评估顺序(此外还有'pseq`的奇怪行为)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆