Haskell / GHC表现“任何”/“全部” [英] Haskell/GHC performance of `any`/`all`

查看:150
本文介绍了Haskell / GHC表现“任何”/“全部”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了量化函数 exists forall none 用于Haskell的内置 [] 列表数据类型。在很多情况下,这些似乎证明比 Prelude / Data.List s 任何全部。我天真地怀疑这个性能是由于任何所有使用Θ(n)折叠实现的。由于我对Haskell相对较新,我认为我必须弄错了,否则这种现象会有一个很好的理由。

数据.Foldable

   -  |确定结构的任何元素是否满足谓词。 
any ::可折叠t => (a - > Bool) - > t a - > Bool
任何p = getAny#。 foldMap(Any#。p)

- |确定结构的所有元素是否满足谓词。
all ::可折叠t => (a - > Bool) - > t a - > Bool
全部p = getAll#。 foldMap(All#。p)

我的实现:

  exists ::(a  - > Bool) - > [a]  - > Bool 
存在_ [] = False
存在pred(x:xs)| pred x = True
|否则=存在预测xs

  forall pred = not。存在(不是。pred)
无pred =不。存在pred = forall(not。pred)

消除布尔反转:

  forall,none ::(a  - > Bool) - > [a]  - > Bool 

forall _ [] = True
forall pred(x:xs)| pred x = forall pred xs
|否则= False

无_ [] = True
无pred(x:xs)| pred x = False
|否则=无预测xs

全部

 时间327.8μs(322.4μs。333.0μs)
0.997R²(0.996R².. 0.998R²)
平均值328.7μs(324.1μs.. 334.2μs)
std dev 16.95μs(14.63μs.. 22.02μs)
$ / code>

forall

 时间113.2 μs(111.2μs.. 115.0μs)
0.997R²(0.996R².. 0.998R²)
平均值112.0μs(110.0μs.. 113.9μs)
std dev 6.333μs(5.127μs。 。7.896μs)

使用标准 nf




正如预期的那样,我没有重新推翻,但低估了编译器标志,天真地没有预料到 -O2 与默认操作相比能够产生如此激烈的整体差异定制化水平绩效,以及个体定制写作与图书馆配方之间的优化效果差距。许多高效的专业标准函数优化显然只有在明确启用时才会启动。



Haskell标签信息的性能部分强调了在测试代码时优化级别编译器标志的重要性效率。通常建议相信库函数实现的复杂性,而不是重新编写 RULES 编译指示或重新构建基本表单,以尝试利用已经培养的优化潜力。

解决方案

我发现以各种方式重新实现 any p>

  import前导隐藏(任何)
import Criterion.Main
导入Data.Foldable(foldMap)
import Data.Monoid

存在

  exists ::(a  - > Bool) - > [a]  - > Bool 
exists _ [] = False
exists pred(x:xs)
= if pred x
then True
else exists pred xs

使用(||)的版本:

  existOr ::(a  - > Bool) - > [a]  - > Bool 
existsOr _ [] = False
existsOr pred(x:xs)= pred x || existsOr pred xs

使用 foldr



  any ::(a  - > Bool) - > [a]  - > Bool 
any pred = foldr((|| ||。pred)False

使用 foldr 任何

  anyF ::(a  - > Bool) - > [a]  - > Bool 
anyF pred = getAny。使用 foldMap 使用 foldMap(mappend。(Any。pred))mempty

code>和任何

  anyFM ::(a - > Bool) - > [a]  - > Bool 
anyFM pred = getAny。 foldmap(任何。pred)

使用 ghc -O0 $ b $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ R²(0.983R².. 0.993R²)
平均值1.482μs(1.427μs.. 1.545μs)
std dev 196.1 ns(168.8 ns .. 229.2 ns)
异常值引起的变化:93 %(严重膨胀)

基准存在或
时间2.699μs(2.616μs.. 2.768μs)
0.992R²(0.988R².. 0.995R²)
平均值2.629 μs(2.554μs.. 2.704μs)
std dev 277.8 ns(235.8 ns。351.1 ns)
由异常值引起的变化:89%(严重膨胀)

基准
时间5.551μs(5.354μs.. 5.777μs)
0.990R²(0.986R².. 0.995R²)
平均值5.553μs(5.395μs.. 5.750微秒)
标准差584.2纳秒(447.5纳秒835.5纳秒)
异常值引起的差异:88%(严重膨胀)

基准任何F
时间7.330 μs(7.081μs.. 7.612μs)
0.988R²(0.982R².. 0.994R²)
平均值7.502μs(7.272μs。7.762μs)
std dev 848.2 ns(712.6 ns。 1.022μs)异常值引入的
差异:89%(严重膨胀)

基准任何FM
时间5.668μs(5.451μs.. 6.008μs)
0.987R² (0.975R².. 0.996R²)
平均值5.807μs(5.659μs.. 5.975μs)
std dev 542.5 ns(446.4 ns .. 721.8 ns)由异常值引入的
差异:86% (严重夸大)

您的版本(存在 )确实是最快的,并且 foldr 版本相当缓慢。

使用 ghc -02 ,你的版本( exists )是最慢的,所有其他函数的速度几乎相同:

 基准存在
时间753.5 ns(725.4 ns .. 779.9 ns)
0.990R²(0.986R².. 0.995R²)
平均值762.4 ns(737.0 ns .. 787.0 ns)
std dev 82.47 ns(66.79 ns .. 105.1 ns)
异常值导致的差异:91%(严重膨胀)

存在基准或者
时间491.5 ns(478.2 ns .. 503.2 ns)
0.994R²(0.992R².. 0.996R²)
表示494.5 ns(481.1 ns .. 512.9 ns)
std dev 54.97 ns(42.54 ns .. 80.34 ns)由异常值引入的
差异:92%(严重膨胀)

基准任何
时间461.2 ns(442.0 ns .. 479.7 ns)
0.989R²(0.985R².. 0.993R²)
平均值456.0 ns(4 39.3 ns .. 476.3 ns)
std dev 60.04 ns(47.27 ns .. 89.47 ns)
异常引入的差异:94%(严重膨胀)

基准任何F
time 436.9 ns(415.8 ns .. 461.0 ns)
0.978R²(0.967R².. 0.988R²)
表示450.8 ns(430.1 ns .. 472.6 ns)
std dev 70.64 ns (57.04纳秒.. 85.92纳秒)异常值导致的
差异:96%(严重膨胀)

基准任何FM
时间438.9 ns(426.9 ns .. 449.5 ns)
0.993R²(0.989R².. 0.996R²)
均值435.8 ns(421.4 ns .. 447.6 ns)
std dev 45.32 ns(36.73 ns .. 58.74 ns)
方差异常值:90%(严重膨胀)

如果查看简化的核心代码( ghc -O2 -ddump-simpl ),你会发现没有 foldr s了( -O0 ,所有东西都还在那里,包括 fold s)。



我想冒险说你的代码比库代码更快(在未优化的版本中, -O0 ),因为它更简单(对于不太普遍的潜在代价)。优化后的库代码比你的版本更快,因为它的编写方式是它的优化潜力被编译器识别。 (诚​​然,这是一个猜测工作)


I wrote quantification functions exists, forall, and none for Haskell's build-in [] list data type. On multiple occasions, these seemed to prove much more efficient than Prelude/Data.Lists any and all. I naively suspect that this performance is due to any and all being implemented using Θ(n) folds. Since I am relatively new to Haskell, I think I must be mistaken, or that there would be a good reason for this phenomenon.

From Data.Foldable:

-- | Determines whether any element of the structure satisfies the predicate.
any :: Foldable t => (a -> Bool) -> t a -> Bool
any p = getAny #. foldMap (Any #. p)

-- | Determines whether all elements of the structure satisfy the predicate.
all :: Foldable t => (a -> Bool) -> t a -> Bool
all p = getAll #. foldMap (All #. p)

My implementations:

exists :: (a -> Bool) -> [a] -> Bool
exists _    []                   = False
exists pred (x : xs) | pred x    = True
                     | otherwise = exists pred xs

and

forall pred  =  not . exists (not . pred)
none pred  =  not . exists pred  =  forall (not . pred)

Eliminating boolean inversions:

forall, none :: (a -> Bool) -> [a] -> Bool

forall _    []                   = True
forall pred (x : xs) | pred x    = forall pred xs
                     | otherwise = False

none _    []                   = True
none pred (x : xs) | pred x    = False
                   | otherwise = none pred xs

all:

time                 327.8 μs   (322.4 μs .. 333.0 μs)
                     0.997 R²   (0.996 R² .. 0.998 R²)
mean                 328.7 μs   (324.1 μs .. 334.2 μs)
std dev              16.95 μs   (14.63 μs .. 22.02 μs)

and forall:

time                 113.2 μs   (111.2 μs .. 115.0 μs)
                     0.997 R²   (0.996 R² .. 0.998 R²)
mean                 112.0 μs   (110.0 μs .. 113.9 μs)
std dev              6.333 μs   (5.127 μs .. 7.896 μs)

Performance measured using criterion's nf.


As anticipated, I did not reinvent the fold, but had underestimated compiler flags, and naively did not expect -O2 to make such drastic overall difference compared to default optimization level performance, nor the optimization effectiveness disparity between the individual custom-wrote and library formulations. Many highly effective specialized standard function optimizations evidently kick in only when explicitly enabled.

The "performance" section of the Haskell tag info emphasizes importance of optimization level compiler flags when testing code efficiency. It is generally advised to trust the sophistication of library function implementations, and, rather than rewire RULES pragmas or reformulate basic forms, to try exploiting already cultivated optimization potential.

解决方案

I find it instructive to re-implement any in the various ways:

import Prelude hiding (any)
import Criterion.Main
import Data.Foldable (foldMap)
import Data.Monoid

Your exists:

exists :: (a -> Bool) -> [a] -> Bool
exists _ [] = False
exists pred (x : xs)
    = if pred x
      then True
      else exists pred xs

A version using (||):

existsOr :: (a -> Bool) -> [a] -> Bool
existsOr _ [] = False
existsOr pred (x : xs) = pred x || existsOr pred xs

Using foldr:

any :: (a -> Bool) -> [a] -> Bool
any pred = foldr ((||) . pred) False

Using foldr and Any:

anyF :: (a -> Bool) -> [a] -> Bool
anyF pred = getAny . foldr (mappend . (Any . pred)) mempty

Using foldMap and Any:

anyFM :: (a -> Bool) -> [a] -> Bool
anyFM pred = getAny . foldMap (Any . pred)

Benchmarks with ghc -O0:

benchmarking exists
time                 1.552 μs   (1.504 μs .. 1.593 μs)
                     0.989 R²   (0.983 R² .. 0.993 R²)
mean                 1.482 μs   (1.427 μs .. 1.545 μs)
std dev              196.1 ns   (168.8 ns .. 229.2 ns)
variance introduced by outliers: 93% (severely inflated)

benchmarking existsOr
time                 2.699 μs   (2.616 μs .. 2.768 μs)
                     0.992 R²   (0.988 R² .. 0.995 R²)
mean                 2.629 μs   (2.554 μs .. 2.704 μs)
std dev              277.8 ns   (235.8 ns .. 351.1 ns)
variance introduced by outliers: 89% (severely inflated)

benchmarking any
time                 5.551 μs   (5.354 μs .. 5.777 μs)
                     0.990 R²   (0.986 R² .. 0.995 R²)
mean                 5.553 μs   (5.395 μs .. 5.750 μs)
std dev              584.2 ns   (447.5 ns .. 835.5 ns)
variance introduced by outliers: 88% (severely inflated)

benchmarking anyF
time                 7.330 μs   (7.081 μs .. 7.612 μs)
                     0.988 R²   (0.982 R² .. 0.994 R²)
mean                 7.502 μs   (7.272 μs .. 7.762 μs)
std dev              848.2 ns   (712.6 ns .. 1.022 μs)
variance introduced by outliers: 89% (severely inflated)

benchmarking anyFM
time                 5.668 μs   (5.451 μs .. 6.008 μs)
                     0.987 R²   (0.975 R² .. 0.996 R²)
mean                 5.807 μs   (5.659 μs .. 5.975 μs)
std dev              542.5 ns   (446.4 ns .. 721.8 ns)
variance introduced by outliers: 86% (severely inflated)

Your version (exists) is indeed the fastest, and the foldr versions are rather slow.

With ghc -O2, your version (exists) is the slowest, and all other functions are nearly equally fast to each other:

benchmarking exists
time                 753.5 ns   (725.4 ns .. 779.9 ns)
                     0.990 R²   (0.986 R² .. 0.995 R²)
mean                 762.4 ns   (737.0 ns .. 787.0 ns)
std dev              82.47 ns   (66.79 ns .. 105.1 ns)
variance introduced by outliers: 91% (severely inflated)

benchmarking existsOr
time                 491.5 ns   (478.2 ns .. 503.2 ns)
                     0.994 R²   (0.992 R² .. 0.996 R²)
mean                 494.5 ns   (481.1 ns .. 512.9 ns)
std dev              54.97 ns   (42.54 ns .. 80.34 ns)
variance introduced by outliers: 92% (severely inflated)

benchmarking any
time                 461.2 ns   (442.0 ns .. 479.7 ns)
                     0.989 R²   (0.985 R² .. 0.993 R²)
mean                 456.0 ns   (439.3 ns .. 476.3 ns)
std dev              60.04 ns   (47.27 ns .. 89.47 ns)
variance introduced by outliers: 94% (severely inflated)

benchmarking anyF
time                 436.9 ns   (415.8 ns .. 461.0 ns)
                     0.978 R²   (0.967 R² .. 0.988 R²)
mean                 450.8 ns   (430.1 ns .. 472.6 ns)
std dev              70.64 ns   (57.04 ns .. 85.92 ns)
variance introduced by outliers: 96% (severely inflated)

benchmarking anyFM
time                 438.9 ns   (426.9 ns .. 449.5 ns)
                     0.993 R²   (0.989 R² .. 0.996 R²)
mean                 435.8 ns   (421.4 ns .. 447.6 ns)
std dev              45.32 ns   (36.73 ns .. 58.74 ns)
variance introduced by outliers: 90% (severely inflated)

If one looks into the simplified Core code (ghc -O2 -ddump-simpl), one sees that there are no foldrs anymore (with -O0, everything is still in there, folds included).

I would thus venture to say that your code is faster (in the un-optimized version, -O0) than the library code, because it is simpler (for the potential price of being less general). The optimized library code is faster than your version, since it is written in a way that its optimization potential is recognized by the compiler. (admittedly, this is a bit of guess work)

这篇关于Haskell / GHC表现“任何”/“全部”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆