为什么添加立即调用的 lambda 会使我的 JavaScript 代码速度提高 2 倍? [英] Why does adding in an immediately invoked lambda make my JavaScript code 2x faster?

查看:27
本文介绍了为什么添加立即调用的 lambda 会使我的 JavaScript 代码速度提高 2 倍?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将一种语言的编译器优化为 JavaScript,并发现了一个非常有趣的案例:

I'm optimizing the compiler of a language to JavaScript, and found a very interesting, if not frustrating, case:

function add(n,m) {
  return n === 0 ? m : add(n - 1, m) + 1;
};
var s = 0;
for (var i = 0; i < 100000; ++i) {
  s += add(4000, 4000);
}
console.log(s);

在我的机器上完成需要 2.3s[1].但是如果我做一个很小的改变:

It takes 2.3s to complete on my machine[1]. But if I make a very small change:

function add(n,m) {
  return (() => n === 0 ? m : add(n - 1, m) + 1)();
};
var s = 0;
for (var i = 0; i < 100000; ++i) {
  s += add(4000, 4000);
}
console.log(s);

它在1.1s内完成.请注意,唯一的区别是在 add 的返回周围添加了一个立即调用的 lambda,(() => ...)().为什么这个添加的调用使我的程序速度提高了两倍?

It completes in 1.1s. Notice the only difference is the addition of an immediately invoked lambda, (() => ...)(), around the return of add. Why does this added call make my program two times faster?

[1] MacBook Pro 13"2020 年,2.3 GHz 四核 Intel Core i7,Node.js v15.3.0

推荐答案

有趣!从代码来看,很明显 IIFE 包装的版本应该更慢,而不是更快:在每次循环迭代中,它都会创建一个新的函数对象并调用它(优化编译器最终会避免这种情况,但这不会)不会马上开始),所以通常只会做更多的工作,而这应该需要更多的时间.

Interesting! From looking at the code, it seems fairly obvious that the IIFE-wrapped version should be slower, not faster: in every loop iteration, it creates a new function object and calls it (which the optimizing compiler will eventually avoid, but that doesn't kick in right away), so generally just does more work, which should be taking more time.

这种情况下的解释是内联.

The explanation in this case is inlining.

一点背景知识:将一个函数内联到另一个函数中(而不是调用它)是优化编译器以实现更好性能的标准技巧之一.不过,这是一把双刃剑:从好的方面来说,它避免了调用开销,并且通常可以实现进一步的优化,例如常量传播或消除重复计算(请参见下面的示例).不利的一面是,它会导致编译需要更长的时间(因为编译器会做更多的工作),并且会导致生成更多的代码并将其存储在内存中(因为内联函数有效地复制了它),并且在像 JavaScript 这样的动态语言中优化的代码通常依赖于受保护的假设,这会增加这些假设之一被证明是错误的风险,并且因此不得不丢弃大量优化的代码.

A bit of background: inlining one function into another (instead of calling it) is one of the standard tricks that optimizing compilers perform in order to achieve better performance. It's a double-edged sword though: on the plus side, it avoids calling overhead, and can often enable further optimizations, such as constant propagation, or elimination of duplicate computation (see below for an example). On the negative side, it causes compilation to take longer (because the compiler does more work), and it causes more code to be generated and stored in memory (because inlining a function effectively duplicates it), and in a dynamic language like JavaScript where optimized code typically relies on guarded assumptions, it increases the risk of one of these assumptions turning out to be wrong and a large amount of optimized code having to be thrown away as a result.

一般来说,做出完美的内联决策(不要太多,也不要太少)需要预测未来:提前知道代码执行的频率和参数.这当然是不可能的,因此优化编译器使用各种规则/启发式"算法.猜测什么是合理的好决定.

Generally speaking, making perfect inlining decisions (not too much, not too little) requires predicting the future: knowing in advance how often and with which parameters the code will be executed. That is, of course, impossible, so optimizing compilers use various rules/"heuristics" to make guesses about what might be a reasonably good decision.

V8 当前的一个规则是:不要内联递归调用.

One rule that V8 currently has is: don't inline recursive calls.

这就是为什么在您的代码的更简单版本中,add 不会被内联到自身中.IIFE 版本本质上有两个相互调用的函数,称为相互递归";-- 事实证明,这个简单的技巧足以欺骗 V8 的优化编译器并使其避开不内联递归调用"规则.相反,它愉快地将未命名的 lambda 内联到 add 中,并将 add 内联到未命名的 lambda 中,依此类推,直到它的内联预算在大约 30 轮后用完.(旁注:有多少被内联"是一种有点复杂的启发式方法,特别是将函数大小考虑在内,所以我们在这里看到的任何特定行为确实是针对这种情况的.)

That's why in the simpler version of your code, add will not get inlined into itself. The IIFE version essentially has two functions calling each other, which is called "mutual recursion" -- and as it turns out, this simple trick is enough to fool V8's optimizing compiler and make it sidestep its "don't inline recursive calls" rule. Instead, it happily inlines the unnamed lambda into add, and add into the unnamed lambda, and so on, until its inlining budget runs out after ~30 rounds. (Side note: "how much gets inlined" is one of the somewhat-complex heuristics and in particular takes function size into account, so whatever specific behavior we see here is indeed specific to this situation.)

在这种特殊情况下,所涉及的函数非常小,内联有很大帮助,因为它避免了调用开销.因此,在这种情况下,内联提供了更好的性能,即使它是递归内联的(伪装)情况,通常通常对性能不利.它确实有代价:在简单版本中,优化编译器仅花费 3 毫秒编译 add,为其生成 562 字节的优化代码.在 IIFE 版本中,编译器花费 30 毫秒并为 add 生成 4318 字节的优化代码.这就是为什么它不像得出V8 应该始终内联更多"那么简单的原因之一:编译的时间和电池消耗很重要,内存消耗也很重要,以及在简单的 10-在一个 100,000 行的应用中,line demo 的成本很可能无法接受(甚至可能会降低整体性能).

In this particular scenario, where the involved functions are very small, inlining helps quite a bit because it avoids call overhead. So in this case, inlining gives better performance, even though it is a (disguised) case of recursive inlining, which in general often is bad for performance. And it does come at a cost: in the simple version, the optimizing compiler spends only 3 milliseconds compiling add, producing 562 bytes of optimized code for it. In the IIFE version, the compiler spends 30 milliseconds and produces 4318 bytes of optimized code for add. That's one reason why it's not as simple as concluding "V8 should always inline more": time and battery consumption for compiling matters, and memory consumption matters too, and what might be acceptable cost (and improve performance significantly) in a simple 10-line demo may well have unacceptable cost (and potentially even cost overall performance) in a 100,000-line app.

现在,在了解发生了什么之后,我们可以回到IIFE 有开销"的问题上来.直觉,并制作一个更快的版本:

Now, having understood what's going on, we can get back to the "IIFEs have overhead" intuition, and craft an even faster version:

function add(n,m) {
  return add_inner(n, m);
};
function add_inner(n, m) {
  return n === 0 ? m : add(n - 1, m) + 1;
}

在我的机器上,我看到:

On my machine, I'm seeing:

  • 简单版:1650 毫秒
  • IIFE 版本:720 毫秒
  • add_inner 版本:460 毫秒

当然,如果您将 add(n, m) 简单地实现为 return n + m,那么它会在 2 毫秒内终止——算法优化胜过任何优化编译器可能会完成:-)

Of course, if you implement add(n, m) simply as return n + m, then it terminates in 2 ms -- algorithmic optimization beats anything an optimizing compiler could possibly accomplish :-)

附录:优化优势示例.考虑这两个函数:

Appendix: Example for benefits of optimization. Consider these two functions:

function Process(x) {
  return (x ** 2) + InternalDetail(x, 0, 2);
}

function InternalDetail(x, offset, power) {
  return (x + offset) ** power;
}

(显然,这是愚蠢的代码;但让我们假设它是在实践中有意义的东西的简化版本.)
天真地执行时,会发生以下步骤:

(Obviously, this is silly code; but let's assume it's a simplified version of something that makes sense in practice.)
When executed naively, the following steps happen:

  1. 评估 temp1 = (x ** 2)
  2. 使用参数x02
  3. 调用InternalDetail
  4. 评估 temp2 = (x + 0)
  5. 求值 temp3 = temp2 ** 2
  6. 返回temp3给调用者
  7. 求值 temp4 = temp1 + temp3
  8. 返回temp4.
  1. evaluate temp1 = (x ** 2)
  2. call InternalDetail with parameters x, 0, 2
  3. evaluate temp2 = (x + 0)
  4. evaluate temp3 = temp2 ** 2
  5. return temp3 to the caller
  6. evaluate temp4 = temp1 + temp3
  7. return temp4.

如果优化编译器执行内联,那么作为第一步,它将得到:

If an optimizing compiler performs inlining, then as a first step it will get:

function Process_after_inlining(x) {
  return (x ** 2) + ( (x + 0) ** 2 );
}

允许两种简化:x + 0 可以折叠为 x,然后 x ** 2 计算发生两次,因此可以通过重用第一次的结果来替换第二次出现:

which allows two simplifications: x + 0 can be folded to just x, and then the x ** 2 computation occurs twice, so the second occurrence can be replaced by reusing the result from the first:

function Process_with_optimizations(x) {
  let temp1 = x ** 2;
  return temp1 + temp1;
}

因此与 naive 执行相比,我们从 7 步减少到了 3 步:

So comparing with the naive execution, we're down to 3 steps from 7:

  1. 评估 temp1 = (x ** 2)
  2. 求值 temp2 = temp1 + temp1
  3. 返回 temp2

我没有预测现实世界的性能会从 7 个时间单位变为 3 个时间单位;这只是为了直观地说明为什么内联可以帮助减少一定数量的计算负载.
脚注:为了说明所有这些东西有多么棘手,请考虑将 x + 0 替换为 x 在 JavaScript 中并不总是可行的,即使编译器知道 x 始终是一个数字:如果 x 碰巧是 -0,则将 0 添加到它会更改为 +0,这很可能是可观察到的程序行为 ;-)

I'm not predicting that real-world performance will go from 7 time units to 3 time units; this is just meant to give an intuitive idea of why inlining can help reduce computational load by some amount.
Footnote: to illustrate how tricky all this stuff is, consider that replacing x + 0 with just x is not always possible in JavaScript, even when the compiler knows that x is always a number: if x happens to be -0, then adding 0 to it changes it to +0, which may well be observable program behavior ;-)

这篇关于为什么添加立即调用的 lambda 会使我的 JavaScript 代码速度提高 2 倍?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆