`any`/`all` 的 Haskell/GHC 性能 [英] Haskell/GHC performance of `any`/`all`

查看:24
本文介绍了`any`/`all` 的 Haskell/GHC 性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我为 Haskell 的内置 [] 列表数据编写了量化函数 existsforallnone类型.在很多情况下,这些似乎比 Prelude/Data.Lists anyall 更有效.我天真地怀疑这种性能是由于 anyall 使用 Θ(n) 折叠实现的.由于我对 Haskell 比较陌生,我想我一定是搞错了,或者说这种现象是有原因的.

I wrote quantification functions exists, forall, and none for Haskell's build-in [] list data type. On multiple occasions, these seemed to prove much more efficient than Prelude/Data.Lists any and all. I naively suspect that this performance is due to any and all being implemented using Θ(n) folds. Since I am relatively new to Haskell, I think I must be mistaken, or that there would be a good reason for this phenomenon.

来自Data.Foldable:

-- | Determines whether any element of the structure satisfies the predicate.
any :: Foldable t => (a -> Bool) -> t a -> Bool
any p = getAny #. foldMap (Any #. p)

-- | Determines whether all elements of the structure satisfy the predicate.
all :: Foldable t => (a -> Bool) -> t a -> Bool
all p = getAll #. foldMap (All #. p)

我的实现:

exists :: (a -> Bool) -> [a] -> Bool
exists _    []                   = False
exists pred (x : xs) | pred x    = True
                     | otherwise = exists pred xs

forall pred  =  not . exists (not . pred)
none pred  =  not . exists pred  =  forall (not . pred)

消除布尔反转:

forall, none :: (a -> Bool) -> [a] -> Bool

forall _    []                   = True
forall pred (x : xs) | pred x    = forall pred xs
                     | otherwise = False

none _    []                   = True
none pred (x : xs) | pred x    = False
                   | otherwise = none pred xs

全部:

time                 327.8 μs   (322.4 μs .. 333.0 μs)
                     0.997 R²   (0.996 R² .. 0.998 R²)
mean                 328.7 μs   (324.1 μs .. 334.2 μs)
std dev              16.95 μs   (14.63 μs .. 22.02 μs)

forall:

time                 113.2 μs   (111.2 μs .. 115.0 μs)
                     0.997 R²   (0.996 R² .. 0.998 R²)
mean                 112.0 μs   (110.0 μs .. 113.9 μs)
std dev              6.333 μs   (5.127 μs .. 7.896 μs)

使用标准的 nf 衡量性能.

Performance measured using criterion's nf.

正如预期的那样,我没有重新发明折叠,但低估了编译器标志,并且天真地没有想到 -O2 与默认优化级别性能相比会产生如此巨大的整体差异,也没有优化效率个别定制编写和库公式之间的差异.许多高效的专业标准功能优化显然只有在明确启用时才会发挥作用.

As anticipated, I did not reinvent the fold, but had underestimated compiler flags, and naively did not expect -O2 to make such drastic overall difference compared to default optimization level performance, nor the optimization effectiveness disparity between the individual custom-wrote and library formulations. Many highly effective specialized standard function optimizations evidently kick in only when explicitly enabled.

Haskell 标签信息的性能"部分强调了在测试代码效率时优化级别编译器标志的重要性.通常建议相信库函数实现的复杂性,而不是重新连接 RULES 编译指示或重新制定基本形式,而是尝试利用已经培养的优化潜力.

The "performance" section of the Haskell tag info emphasizes importance of optimization level compiler flags when testing code efficiency. It is generally advised to trust the sophistication of library function implementations, and, rather than rewire RULES pragmas or reformulate basic forms, to try exploiting already cultivated optimization potential.

推荐答案

我发现以各种方式重新实现 any 很有启发性:

I find it instructive to re-implement any in the various ways:

import Prelude hiding (any)
import Criterion.Main
import Data.Foldable (foldMap)
import Data.Monoid

您的存在:

exists :: (a -> Bool) -> [a] -> Bool
exists _ [] = False
exists pred (x : xs)
    = if pred x
      then True
      else exists pred xs

使用(||)的版本:

existsOr :: (a -> Bool) -> [a] -> Bool
existsOr _ [] = False
existsOr pred (x : xs) = pred x || existsOr pred xs

使用foldr:

any :: (a -> Bool) -> [a] -> Bool
any pred = foldr ((||) . pred) False

使用 foldrAny:

anyF :: (a -> Bool) -> [a] -> Bool
anyF pred = getAny . foldr (mappend . (Any . pred)) mempty

使用 foldMapAny:

anyFM :: (a -> Bool) -> [a] -> Bool
anyFM pred = getAny . foldMap (Any . pred)

使用 ghc -O0 进行基准测试:

Benchmarks with ghc -O0:

benchmarking exists
time                 1.552 μs   (1.504 μs .. 1.593 μs)
                     0.989 R²   (0.983 R² .. 0.993 R²)
mean                 1.482 μs   (1.427 μs .. 1.545 μs)
std dev              196.1 ns   (168.8 ns .. 229.2 ns)
variance introduced by outliers: 93% (severely inflated)

benchmarking existsOr
time                 2.699 μs   (2.616 μs .. 2.768 μs)
                     0.992 R²   (0.988 R² .. 0.995 R²)
mean                 2.629 μs   (2.554 μs .. 2.704 μs)
std dev              277.8 ns   (235.8 ns .. 351.1 ns)
variance introduced by outliers: 89% (severely inflated)

benchmarking any
time                 5.551 μs   (5.354 μs .. 5.777 μs)
                     0.990 R²   (0.986 R² .. 0.995 R²)
mean                 5.553 μs   (5.395 μs .. 5.750 μs)
std dev              584.2 ns   (447.5 ns .. 835.5 ns)
variance introduced by outliers: 88% (severely inflated)

benchmarking anyF
time                 7.330 μs   (7.081 μs .. 7.612 μs)
                     0.988 R²   (0.982 R² .. 0.994 R²)
mean                 7.502 μs   (7.272 μs .. 7.762 μs)
std dev              848.2 ns   (712.6 ns .. 1.022 μs)
variance introduced by outliers: 89% (severely inflated)

benchmarking anyFM
time                 5.668 μs   (5.451 μs .. 6.008 μs)
                     0.987 R²   (0.975 R² .. 0.996 R²)
mean                 5.807 μs   (5.659 μs .. 5.975 μs)
std dev              542.5 ns   (446.4 ns .. 721.8 ns)
variance introduced by outliers: 86% (severely inflated)

你的版本(exists)确实是最快的,而foldr的版本比较慢.

Your version (exists) is indeed the fastest, and the foldr versions are rather slow.

使用 ghc -O2,你的版本 (exists) 是最慢的,而所有其他函数的速度几乎相同:

With ghc -O2, your version (exists) is the slowest, and all other functions are nearly equally fast to each other:

benchmarking exists
time                 753.5 ns   (725.4 ns .. 779.9 ns)
                     0.990 R²   (0.986 R² .. 0.995 R²)
mean                 762.4 ns   (737.0 ns .. 787.0 ns)
std dev              82.47 ns   (66.79 ns .. 105.1 ns)
variance introduced by outliers: 91% (severely inflated)

benchmarking existsOr
time                 491.5 ns   (478.2 ns .. 503.2 ns)
                     0.994 R²   (0.992 R² .. 0.996 R²)
mean                 494.5 ns   (481.1 ns .. 512.9 ns)
std dev              54.97 ns   (42.54 ns .. 80.34 ns)
variance introduced by outliers: 92% (severely inflated)

benchmarking any
time                 461.2 ns   (442.0 ns .. 479.7 ns)
                     0.989 R²   (0.985 R² .. 0.993 R²)
mean                 456.0 ns   (439.3 ns .. 476.3 ns)
std dev              60.04 ns   (47.27 ns .. 89.47 ns)
variance introduced by outliers: 94% (severely inflated)

benchmarking anyF
time                 436.9 ns   (415.8 ns .. 461.0 ns)
                     0.978 R²   (0.967 R² .. 0.988 R²)
mean                 450.8 ns   (430.1 ns .. 472.6 ns)
std dev              70.64 ns   (57.04 ns .. 85.92 ns)
variance introduced by outliers: 96% (severely inflated)

benchmarking anyFM
time                 438.9 ns   (426.9 ns .. 449.5 ns)
                     0.993 R²   (0.989 R² .. 0.996 R²)
mean                 435.8 ns   (421.4 ns .. 447.6 ns)
std dev              45.32 ns   (36.73 ns .. 58.74 ns)
variance introduced by outliers: 90% (severely inflated)

如果查看简化的核心代码 (ghc -O2 -ddump-simpl),您会发现不再有 foldrs(使用 -O0,一切都还在,folds 包括在内).

If one looks into the simplified Core code (ghc -O2 -ddump-simpl), one sees that there are no foldrs anymore (with -O0, everything is still in there, folds included).

因此我敢说你的代码比库代码更快(在未优化的版本中,-O0),因为它更简单(因为它的潜在代价是不那么通用).优化的库代码比您的版本更快,因为它是以编译器识别其优化潜力的方式编写的.(诚​​然,这有点猜测)

I would thus venture to say that your code is faster (in the un-optimized version, -O0) than the library code, because it is simpler (for the potential price of being less general). The optimized library code is faster than your version, since it is written in a way that its optimization potential is recognized by the compiler. (admittedly, this is a bit of guess work)

这篇关于`any`/`all` 的 Haskell/GHC 性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆