是什么让这个函数运行得更慢? [英] What makes this function run much slower?

查看:47
本文介绍了是什么让这个函数运行得更慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试做一个实验,看看函数中的局部变量是否存储在堆栈中.

所以我写了一个小性能测试

函数测试(fn,次){var i = 次;var t = Date.now()当我 - ){fn()}返回 Date.now() - t;}烯函数直(){变量 a = 1无功b = 2无功c = 3无功d = 4变量 e = 5a = a * 5b = Math.pow(b, 10)c = Math.pow(c, 11)d = Math.pow(d, 12)e = Math.pow(e, 25)}函数反转(){变量 a = 1无功b = 2无功c = 3无功d = 4变量 e = 5e = Math.pow(e, 25)d = Math.pow(d, 12)c = Math.pow(c, 11)b = Math.pow(b, 10)a = a * 5}

我希望能更快地获得反函数.反而出现了惊人的结果.

在我测试其中一个函数之前,它的运行速度比测试第二个函数快 10 倍.

示例:

<代码>>测试(直,10000000)30>测试(直,10000000)32>测试(反向,10000000)390>测试(直,10000000)392>测试(反向,10000000)390

以不同顺序测试时的行为相同.

<代码>>测试(反向,10000000)25>测试(直,10000000)392>测试(反向,10000000)394

我已经在 Chrome 浏览器和 Node.js 中对其进行了测试,但我完全不知道为什么会发生这种情况.效果持续到我刷新当前页面或重启 Node REPL.

如此显着(约差 12 倍)性能的原因是什么?

附注.由于它似乎只能在某些环境中工作,因此请编写您正在使用的环境来测试它.

我的是:

操作系统:Ubuntu 14.04
节点 v0.10.37
Chrome 43.0.2357.134(官方版本)(64 位)

/编辑
在 Firefox 39 上,无论顺序如何,每次测试都需要大约 5500 毫秒.它似乎只发生在特定引擎上.

/Edit2
将函数内联到测试函数使其始终在同一时间运行.
如果函数参数始终是同一个函数,是否可能有内联函数参数的优化?

解决方案

一旦你用两个不同的函数 fn() 调用 test ,它里面的 callsite 就变成了 megamorphic 而 V8 是无法内联它.

V8 中的函数调用(相对于方法调用o.m(...))伴随着一个元素内联缓存,而不是真正的多态内联缓存.

由于 V8 无法在 fn() 调用站点内联,因此无法对您的代码应用各种优化.如果您查看

V8 只是内联了straight删除了所有您希望通过死代码消除优化进行基准测试的代码.在旧版本的 V8 上,而不是 DCE,V8 只会通过 LICM 将代码提升到循环之外 - 因为代码是完全循环不变的.

straight 未内联时,V8 无法应用这些优化 - 因此存在性能差异.较新版本的 V8 仍然会将 DCE 应用于 straightinversed 本身,将它们变成空函数

所以性能差异并不大(大约 2-3 倍).较旧的 V8 对 DCE 不够激进——这将体现在内联和非内联案例之间更大的差异,因为内联案例的峰值性能完全是激进循环不变代码运动 (LICM) 的结果.

在相关说明中,这说明了为什么基准测试永远不应该这样编写 - 因为当您最终测量一个空循环时,它们的结果没有任何用处.

如果您对多态性及其在 V8 中的含义感兴趣,请查看我的帖子 单态是怎么回事"(并非所有缓存都相同"一节讨论了与函数调用相关的缓存).我还建议通读我关于微基准测试危险的演讲之一,例如GOTO 芝加哥 2015 年的最新 基准 JS" 演讲(video) - 它可以帮助您避免常见的陷阱.

I've been trying to make an experiment to see if the local variables in functions are stored on a stack.

So I wrote a little performance test

function test(fn, times){
    var i = times;
    var t = Date.now()
    while(i--){
        fn()
    }
    return Date.now() - t;
} 
ene
function straight(){
    var a = 1
    var b = 2
    var c = 3
    var d = 4
    var e = 5
    a = a * 5
    b = Math.pow(b, 10)
    c = Math.pow(c, 11)
    d = Math.pow(d, 12)
    e = Math.pow(e, 25)
}
function inversed(){
    var a = 1
    var b = 2
    var c = 3
    var d = 4
    var e = 5
    e = Math.pow(e, 25)
    d = Math.pow(d, 12)
    c = Math.pow(c, 11)
    b = Math.pow(b, 10)
    a = a * 5
}

I expected to get inversed function work much faster. Instead an amazing result came out.

Untill I test one of the functions it runs 10 times faster than after testing the second one.

Example:

> test(straight, 10000000)
30
> test(straight, 10000000)
32
> test(inversed, 10000000)
390
> test(straight, 10000000)
392
> test(inversed, 10000000)
390

Same behaviour when tested in alternative order.

> test(inversed, 10000000)
25
> test(straight, 10000000)
392
> test(inversed, 10000000)
394

I've tested it both in the Chrome browser and in Node.js and I've got absolutely no clue why would it happen. The effect lasts till I refresh the current page or restart Node REPL.

What could be a source of such significant (~12 times worse) performance?

PS. Since it seems to work only in some environemnts please write the environment You're using to test it.

Mine were:

OS: Ubuntu 14.04
Node v0.10.37
Chrome 43.0.2357.134 (Official Build) (64-bit)

/Edit
On Firefox 39 it takes ~5500 ms for each test regardless of the order. It seems to occur only on specific engines.

/Edit2
Inlining the function to the test function makes it run always the same time.
Is it possible that there is an optimization that inlines the function parameter if it's always the same function?

解决方案

Once you call test with two different functions fn() callsite inside it becomes megamorphic and V8 is unable to inline at it.

Function calls (as opposed to method calls o.m(...)) in V8 are accompanied by one element inline cache instead of a true polymorphic inline cache.

Because V8 is unable to inline at fn() callsite it is unable to apply a variety of optimizations to your code. If you look at your code in IRHydra (I uploaded compilation artifacts to gist for your convinience) you will notice that first optimized version of test (when it was specialized for fn = straight) has a completely empty main loop.

V8 just inlined straight and removed all the code your hoped to benchmark with Dead Code Elimination optimization. On an older version of V8 instead of DCE V8 would just hoist the code out of the loop via LICM - because the code is completely loop invariant.

When straight is not inlined V8 can't apply these optimizations - hence the performance difference. Newer version of V8 would still apply DCE to straight and inversed themselves turning them into empty functions

so the performance difference is not that big (around 2-3x). Older V8 was not aggressive enough with DCE - and that would manifest in bigger difference between inlined and not-inlined cases, because peak performance of inlined case was solely result of aggressive loop-invariant code motion (LICM).

On related note this shows why benchmarks should never be written like this - as their results are not of any use as you end up measuring an empty loop.

If you are interested in polymorphism and its implications in V8 check out my post "What's up with monomorphism" (section "Not all caches are the same" talks about the caches associated with function calls). I also recommend reading through one of my talks about dangers of microbenchmarking, e.g. most recent "Benchmarking JS" talk from GOTO Chicago 2015 (video) - it might help you to avoid common pitfalls.

这篇关于是什么让这个函数运行得更慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆