当找到列表的倒数第二个元素时,为什么在这些当中使用`last`最快? [英] When finding last but second element of a list, why is using `last` the fastest among these?

查看:81
本文介绍了当找到列表的倒数第二个元素时,为什么在这些当中使用`last`最快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面提供了3个函数,这些函数可以找到列表中的最后一个但第二个元素.使用last . init的人似乎比其他人快得多.我似乎不知道为什么.

There are 3 function given below which find the last but second element in a list. The one using last . init seems much faster than the rest. I can't seem to figure out why.

为了进行测试,我使用了[1..100000000]的输入列表(亿).最后一个几乎立即运行,而其他一个则需要几秒钟.

For testing, I used an input list of [1..100000000] (100 million). The last one runs almost instantly whereas the others take several seconds.

-- slow
myButLast :: [a] -> a
myButLast [x, y] = x
myButLast (x : xs) = myButLast xs
myButLast _ = error "List too short"

-- decent
myButLast' :: [a] -> a
myButLast' = (!! 1) . reverse

-- fast
myButLast'' :: [a] -> a
myButLast'' = last . init

推荐答案

研究速度和优化时,很容易获得非常错误的结果.在 特别是,您不能不说一个变体比另一个变体快得多 编译器版本和基准设置的优化模式.即使那样,现代 处理器是如此复杂,以至于具有基于神经网络的分支预测器,而不是 提到各种缓存,因此,即使进行了仔细的设置,基准测试结果也将是模糊的.

When studying speed and optimization, it is very easy to obtain wildly wrong results. In particular, you cannot really say that one variant is faster than another without mentioning the compiler version and the optimization mode of your benchmarking setup. Even then, modern processors are so sophisticated as to feature neural network based branch predictors, not to mention all kinds of caches, so, even with careful set-up, benchmarking results will be blurry.

话虽如此...

criterion 是一个提供高级基准测试工具的软件包.我迅速起草了一份 像这样的基准:

criterion is a package that provides advanced benchmarking tools. I quickly drafted a benchmark like this:

module Main where

import Criterion
import Criterion.Main

-- slow
myButLast :: [a] -> a
myButLast [x, y] = x
myButLast (x : xs) = myButLast xs
myButLast _ = error "List too short"

-- decent
myButLast' :: [a] -> a
myButLast' = (!! 1) . reverse

-- fast
myButLast'' :: [a] -> a
myButLast'' = last . init

butLast2 :: [a] -> a
butLast2 (x :     _ : [ ] ) = x
butLast2 (_ : xs@(_ : _ ) ) = butLast2 xs
butLast2 _ = error "List too short"

setupEnv = do
  let xs = [1 .. 10^7] :: [Int]
  return xs

benches xs =
  [ bench "slow?"   $ nf myButLast   xs
  , bench "decent?" $ nf myButLast'  xs
  , bench "fast?"   $ nf myButLast'' xs
  , bench "match2"  $ nf butLast2    xs
  ]

main = defaultMain
    [ env setupEnv $ \ xs -> bgroup "main" $ let bs = benches xs in bs ++ reverse bs ]

如您所见,我添加了一次在两个元素上显式匹配的变体,但否则 逐字是相同的代码.我还反向运行基准测试,以了解应有的偏见 进行缓存.所以,让我们跑步看看!

As you see, I added the variant that explicitly matches on two elements at once, but otherwise it is the same code verbatim. I also run the benchmarks in reverse, so as to be aware of the bias due to caching. So, let us run and see!

% ghc --version
The Glorious Glasgow Haskell Compilation System, version 8.6.5


% ghc -O2 -package criterion A.hs && ./A
benchmarking main/slow?
time                 54.83 ms   (54.75 ms .. 54.90 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 54.86 ms   (54.82 ms .. 54.93 ms)
std dev              94.77 μs   (54.95 μs .. 146.6 μs)

benchmarking main/decent?
time                 794.3 ms   (32.56 ms .. 1.293 s)
                     0.907 R²   (0.689 R² .. 1.000 R²)
mean                 617.2 ms   (422.7 ms .. 744.8 ms)
std dev              201.3 ms   (105.5 ms .. 283.3 ms)
variance introduced by outliers: 73% (severely inflated)

benchmarking main/fast?
time                 84.60 ms   (84.37 ms .. 84.95 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 84.46 ms   (84.25 ms .. 84.77 ms)
std dev              435.1 μs   (239.0 μs .. 681.4 μs)

benchmarking main/match2
time                 54.87 ms   (54.81 ms .. 54.95 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 54.85 ms   (54.81 ms .. 54.92 ms)
std dev              104.9 μs   (57.03 μs .. 178.7 μs)

benchmarking main/match2
time                 50.60 ms   (47.17 ms .. 53.01 ms)
                     0.993 R²   (0.981 R² .. 0.999 R²)
mean                 60.74 ms   (56.57 ms .. 67.03 ms)
std dev              9.362 ms   (6.074 ms .. 10.95 ms)
variance introduced by outliers: 56% (severely inflated)

benchmarking main/fast?
time                 69.38 ms   (56.64 ms .. 78.73 ms)
                     0.948 R²   (0.835 R² .. 0.994 R²)
mean                 108.2 ms   (92.40 ms .. 129.5 ms)
std dev              30.75 ms   (19.08 ms .. 37.64 ms)
variance introduced by outliers: 76% (severely inflated)

benchmarking main/decent?
time                 770.8 ms   (345.9 ms .. 1.004 s)
                     0.967 R²   (0.894 R² .. 1.000 R²)
mean                 593.4 ms   (422.8 ms .. 691.4 ms)
std dev              167.0 ms   (50.32 ms .. 226.1 ms)
variance introduced by outliers: 72% (severely inflated)

benchmarking main/slow?
time                 54.87 ms   (54.77 ms .. 55.00 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 54.95 ms   (54.88 ms .. 55.10 ms)
std dev              185.3 μs   (54.54 μs .. 251.8 μs)

看起来我们的"slow" 版本一点都不慢!而且模式匹配的复杂性并没有 添加任何东西. (我们发现连续两次运行match2之间有轻微的加速,我将其归因于 缓存的效果.)

Looks like our "slow" version is not slow at all! And the intricacies of pattern matching do not add anything. (A slight speed up we see between two consecutive runs of match2 I ascribe to the effects of caching.)

有一种方法可以获取更多的科学"数据:我们可以

There is a way to get more "scientific" data: we can -ddump-simpl and take a look at the way the compiler sees our code.

核心" 是GHC的内部语言.每个Haskell源文件之前都已简化为Core 被转换为最终功能图以供运行时系统执行.如果我们看 在这个中间阶段,它将告诉我们myButLastbutLast2是等效的.它 确实需要查找,因为在重命名阶段,我们所有不错的标识符都是随机篡改的.

"Core" is an internal language of GHC. Every Haskell source file is simplified to Core before being transformed into the final functional graph for the run time system to execute. If we look at this intermediate stage, it will tell us that myButLast and butLast2 are equivalent. It does take looking, since, at the renaming stage, all our nice identifiers are randomly mangled.

% for i in `seq 1 4`; do echo; cat A$i.hs; ghc -O2 -ddump-simpl A$i.hs > A$i.simpl; done

module A1 where

-- slow
myButLast :: [a] -> a
myButLast [x, y] = x
myButLast (x : xs) = myButLast xs
myButLast _ = error "List too short"

module A2 where

-- decent
myButLast' :: [a] -> a
myButLast' = (!! 1) . reverse

module A3 where

-- fast
myButLast'' :: [a] -> a
myButLast'' = last . init

module A4 where

butLast2 :: [a] -> a
butLast2 (x :     _ : [ ] ) = x
butLast2 (_ : xs@(_ : _ ) ) = butLast2 xs
butLast2 _ = error "List too short"

% ./EditDistance.hs *.simpl
(("A1.simpl","A2.simpl"),3866)
(("A1.simpl","A3.simpl"),3794)
(("A2.simpl","A3.simpl"),663)
(("A1.simpl","A4.simpl"),607)
(("A2.simpl","A4.simpl"),4188)
(("A3.simpl","A4.simpl"),4113)

似乎A1A4最相似.彻底的检查将表明 A1A4中的代码结构相同. A2A3相似也是合理的 因为两者都被定义为两个功能的组合.

It seems that A1 and A4 are the most similar. Thorough inspection will show that indeed the code structures in A1 and A4 are identical. That A2 and A3 are alike is also reasonable since both are defined as a composition of two functions.

如果您要广泛检查core输出,也可以使用供应标志,例如-dsuppress-module-prefixes-dsuppress-uniques.它们使阅读变得如此容易.

If you are going to examine the core output extensively, it makes sense to also supply flags such as -dsuppress-module-prefixes and -dsuppress-uniques. They make it so much easier to read.

那么,基准测试和优化有什么问题呢?

So, what can go wrong with benchmarking and optimization?

  • ghci被设计用于交互式播放和快速迭代,可将Haskell源代码编译为某种形式的字节码,而不是最终的可执行文件,并避免进行昂贵的优化,以便更快地重新加载.
  • 分析似乎是一个不错的工具,可以研究复杂程序中各个位和部分的性能,但它可能会严重破坏编译器的优化,其结果将比基准低几个数量级.
    • 您的保护措施是使用自己的基准运行程序将每一小段代码作为独立的可执行文件进行分析.
    • ghci, being designed for interactive play and quick iteration, compiles Haskell source to a certain flavour of byte code, rather than final executable, and eschews expensive optimizations in favour of quicker reload.
    • Profiling seems like a nice tool to look into performance of individual bits and pieces of a complex program, but it can wreck compiler optimizations so badly, the results will be orders of magnitude off base.
      • Your safeguard is to profile every small bit of code as a separate executable, with its own benchmark runner.

      这可能看起来很难过.但是,大多数时候,与Haskell程序员相关的事情并不是真正的事情.真实故事:我有一个朋友最近刚开始学习Haskell.他们编写了一个用于数值积分的程序,而且速度很慢.因此,我们坐在一起,编写了类别描述,并附有图表和相关内容.当他们重新编写代码以使其与抽象描述保持一致时,它神奇地变得像猎豹一样快速,并且内存也很薄.我们很快就计算出π.这个故事所讲的道德?完美的抽象结构,您的代码将优化自身.

      This may look sad. But it is really not the thing that should concern a Haskell programmer, most of the time. Real story: I have a friend who just recently started learning Haskell. They had written a program for numerical integration, and it was turtle slow. So we sat down together and wrote a categorial description of the algorithm, with diagrams and stuff. When they re-wrote the code to align with the abstract description, it magically became, like, cheetah fast, and slim on memory too. We calculated π in no time. Moral of the story? Perfect abstract structure, and your code will optimize itself.

      这篇关于当找到列表的倒数第二个元素时,为什么在这些当中使用`last`最快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆