为什么webAssembly的功能比同样的JS函数快300倍 [英] Why is webAssembly function almost 300 time slower than same JS function

查看:208
本文介绍了为什么webAssembly的功能比同样的JS函数快300倍的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我首先阅读了为什么我的WebAssembly功能比JavaScript等效的慢?

但它没有解决这个问题,我投入了大量资金时间很可能是墙上的黄色东西。

But it has shed little light on the problem, and I have invested a lot of time that may well be that yellow stuff against the wall.

我不使用全局变量,我不使用任何内存。我有两个简单的函数,可以找到一个线段的长度,并将它们与普通的旧Javascript中的相同内容进行比较。我有4个参数3个本地人并返回一个浮点数或双倍。

I do not use globals, I do not use any memory. I have two simple functions that find the length of a line segment and compare them to the same thing in plain old Javascript. I have 4 params 3 more locals and returns a float or double.

在Chrome上,Javascript比webAssembly快40倍,而在Firefox上,ism几乎比Javascript慢300

On Chrome the Javascript is 40 times faster than the webAssembly and on firefox the wasm is almost 300 times slower than the Javascript.

我添加了一个测试案例到jsPref WebAssembly V Javascript数学

I have added a test case to jsPref WebAssembly V Javascript math

要么


  1. 我错过了一个明显的错误,不好的做法,或者我的编码器愚蠢。

  2. WebAssembly不适用于32位操作系统(赢得10台笔记本电脑i7CPU)

  3. WebAssembly是远离现成技术。

请选择1。

我已阅读 webAssembly用例

通过定位WebAsse重用现有代码mbly,嵌入在一个更大的
JavaScript / HTML应用程序中。这可以是简单的
帮助库,也可以是面向计算的任务卸载。

Re-use existing code by targeting WebAssembly, embedded in a larger JavaScript / HTML application. This could be anything from simple helper libraries, to compute-oriented task offload.

我希望我可以替换一些使用webAssembly的几何库可以获得一些额外的性能。我希望它会很棒,比10倍或更快。但是WTF慢300倍。

I was hoping I could replace some geometry libs with webAssembly to get some extra performance. I was hoping that it would be awesome, like 10 or more times faster. BUT 300 times slower WTF.

这是不是JS优化问题。

This is not a JS optimisation issues.

为确保优化尽可能少,我已使用以下方法测试,以减少或消除任何优化偏差..

To ensure that optimisation has as little as possible effect I have tested using the following methods to reduce or eliminate any optimisation bias..


  • 计数器 c + = length(... 以确保所有代码都被执行。

  • bigCount + = c 确保执行整个功能。不需要

  • 每个函数有4行减少内联倾斜。不需要

  • 所有值都是随机生成的双打

  • 每个函数调用返回不同的结果。

  • 使用 Math.hypot 在JS中添加较慢的长度计算以证明代码正在运行。

  • 添加返回第一个参数JS的空调用见开销

  • counter c += length(... to ensure all code is executed.
  • bigCount += c to ensure whole function is executed. Not needed
  • 4 lines for each function to reduce a inlining skew. Not Needed
  • all values are randomly generated doubles
  • each function call returns a different result.
  • add slower length calculation in JS using Math.hypot to prove code is being run.
  • added empty call that return first param JS to see overhead

// setup and associated functions
    const setOf = (count, callback) => {var a = [],i = 0; while (i < count) { a.push(callback(i ++)) } return a };
    const rand  = (min = 1, max = min + (min = 0)) => Math.random() * (max - min) + min;
    const a = setOf(100009,i=>rand(-100000,100000));
    var bigCount = 0;




    function len(x,y,x1,y1){
        var nx = x1 - x;
        var ny = y1 - y;
        return Math.sqrt(nx * nx + ny * ny);
    }
    function lenSlow(x,y,x1,y1){
        var nx = x1 - x;
        var ny = y1 - y;
        return Math.hypot(nx,ny);
    }
    function lenEmpty(x,y,x1,y1){
        return x;
    }


// Test functions in same scope as above. None is in global scope
// Each function is copied 4 time and tests are performed randomly.
// c += length(...  to ensure all code is executed. 
// bigCount += c to ensure whole function is executed.
// 4 lines for each function to reduce a inlining skew
// all values are randomly generated doubles 
// each function call returns a different result.

tests : [{
        func : function (){
            var i,c=0,a1,a2,a3,a4;
            for (i = 0; i < 10000; i += 1) {
                a1 = a[i];
                a2 = a[i+1];
                a3 = a[i+2];
                a4 = a[i+3];
                c += length(a1,a2,a3,a4);
                c += length(a2,a3,a4,a1);
                c += length(a3,a4,a1,a2);
                c += length(a4,a1,a2,a3);
            }
            bigCount = (bigCount + c) % 1000;
        },
        name : "length64",
    },{
        func : function (){
            var i,c=0,a1,a2,a3,a4;
            for (i = 0; i < 10000; i += 1) {
                a1 = a[i];
                a2 = a[i+1];
                a3 = a[i+2];
                a4 = a[i+3];
                c += lengthF(a1,a2,a3,a4);
                c += lengthF(a2,a3,a4,a1);
                c += lengthF(a3,a4,a1,a2);
                c += lengthF(a4,a1,a2,a3);
            }
            bigCount = (bigCount + c) % 1000;
        },
        name : "length32",
    },{
        func : function (){
            var i,c=0,a1,a2,a3,a4;
            for (i = 0; i < 10000; i += 1) {
                a1 = a[i];
                a2 = a[i+1];
                a3 = a[i+2];
                a4 = a[i+3];                    
                c += len(a1,a2,a3,a4);
                c += len(a2,a3,a4,a1);
                c += len(a3,a4,a1,a2);
                c += len(a4,a1,a2,a3);
            }
            bigCount = (bigCount + c) % 1000;
        },
        name : "length JS",
    },{
        func : function (){
            var i,c=0,a1,a2,a3,a4;
            for (i = 0; i < 10000; i += 1) {
                a1 = a[i];
                a2 = a[i+1];
                a3 = a[i+2];
                a4 = a[i+3];                    
                c += lenSlow(a1,a2,a3,a4);
                c += lenSlow(a2,a3,a4,a1);
                c += lenSlow(a3,a4,a1,a2);
                c += lenSlow(a4,a1,a2,a3);
            }
            bigCount = (bigCount + c) % 1000;
        },
        name : "Length JS Slow",
    },{
        func : function (){
            var i,c=0,a1,a2,a3,a4;
            for (i = 0; i < 10000; i += 1) {
                a1 = a[i];
                a2 = a[i+1];
                a3 = a[i+2];
                a4 = a[i+3];                    
                c += lenEmpty(a1,a2,a3,a4);
                c += lenEmpty(a2,a3,a4,a1);
                c += lenEmpty(a3,a4,a1,a2);
                c += lenEmpty(a4,a1,a2,a3);
            }
            bigCount = (bigCount + c) % 1000;
        },
        name : "Empty",
    }
],

因为测试中有更多的开销,结果更接近但JS代码仍然快了两个数量级。

Because there is a lot more overhead in the test the results are closer but the JS code is still two orders of magnitude faster.

注意函数 Math.hypo t的速度有多慢。如果优化有效,那么函数将接近更快的 len 函数。

Note how slow the function Math.hypot is. If optimisation was in effect that function would be near the faster len function.


  • WebAssembly13389μs

  • Javascript728μs

/*
=======================================
Performance test. : WebAssm V Javascript
Use strict....... : true
Data view........ : false
Duplicates....... : 4
Cycles........... : 147
Samples per cycle : 100
Tests per Sample. : undefined
---------------------------------------------
Test : 'length64'
Mean : 12736µs ±69µs (*) 3013 samples
---------------------------------------------
Test : 'length32'
Mean : 13389µs ±94µs (*) 2914 samples
---------------------------------------------
Test : 'length JS'
Mean : 728µs ±6µs (*) 2906 samples
---------------------------------------------
Test : 'Length JS Slow'
Mean : 23374µs ±191µs (*) 2939 samples   << This function use Math.hypot 
                                            rather than Math.sqrt
---------------------------------------------
Test : 'Empty'
Mean : 79µs ±2µs (*) 2928 samples
-All ----------------------------------------
Mean : 10.097ms Totals time : 148431.200ms 14700 samples
(*) Error rate approximation does not represent the variance.

*/

如果没有优化,那么WebAssmbly的重点是什么

Whats the point of WebAssambly if it does not optimise

更新结束

查找一行的长度。

自定义语言的原始来源

   
// declare func the < indicates export name, the param with types and return type
func <lengthF(float x, float y, float x1, float y1) float {
    float nx, ny, dist;  // declare locals float is f32
    nx = x1 - x;
    ny = y1 - y;
    dist = sqrt(ny * ny + nx * nx);
    return dist;
}
// and as double
func <length(double x, double y, double x1, double y1) double {
    double nx, ny, dist;
    nx = x1 - x;
    ny = y1 - y;
    dist = sqrt(ny * ny + nx * nx);
    return dist;
}

代码编译为Wat以进行校样读取

Code compiles to Wat for proof read

(module
(func 
    (export "lengthF")
    (param f32 f32 f32 f32)
    (result f32)
    (local f32 f32 f32)
    get_local 2
    get_local 0
    f32.sub
    set_local 4
    get_local 3
    get_local 1
    f32.sub
    tee_local 5
    get_local 5
    f32.mul
    get_local 4
    get_local 4
    f32.mul
    f32.add
    f32.sqrt
)
(func 
    (export "length")
    (param f64 f64 f64 f64)
    (result f64)
    (local f64 f64 f64)
    get_local 2
    get_local 0
    f64.sub
    set_local 4
    get_local 3
    get_local 1
    f64.sub
    tee_local 5
    get_local 5
    f64.mul
    get_local 4
    get_local 4
    f64.mul
    f64.add
    f64.sqrt
)
)

以十六进制字符串编译的wasm(不是e不包括名称部分)并使用WebAssembly.compile加载。导出的函数然后针对Javascript函数len运行(在下面的代码片段中)

As compiled wasm in hex string (Note does not include name section) and loaded using WebAssembly.compile. Exported functions then run against Javascript function len (in below snippet)

    // hex of above without the name section
    const asm = `0061736d0100000001110260047d7d7d7d017d60047c7c7c7c017c0303020001071402076c656e677468460000066c656e67746800010a3b021c01037d2002200093210420032001932205200594200420049492910b1c01037c20022000a1210420032001a122052005a220042004a2a09f0b`
    const bin = new Uint8Array(asm.length >> 1);
    for(var i = 0; i < asm.length; i+= 2){ bin[i>>1] = parseInt(asm.substr(i,2),16) }
    var length,lengthF;

    WebAssembly.compile(bin).then(module => {
        const wasmInstance = new WebAssembly.Instance(module, {});
        lengthF = wasmInstance.exports.lengthF;
        length = wasmInstance.exports.length;
    });
    // test values are const (same result if from array or literals)
    const a1 = rand(-100000,100000);
    const a2 = rand(-100000,100000);
    const a3 = rand(-100000,100000);
    const a4 = rand(-100000,100000);

    // javascript version of function
    function len(x,y,x1,y1){
        var nx = x1 - x;
        var ny = y1 - y;
        return Math.sqrt(nx * nx + ny * ny);
    }

并且测试代码对于所有3个函数都是相同的以严格模式运行。

And the test code is the same for all 3 functions and run in strict mode.

 tests : [{
        func : function (){
            var i;
            for (i = 0; i < 100000; i += 1) {
               length(a1,a2,a3,a4);

            }
        },
        name : "length64",
    },{
        func : function (){
            var i;
            for (i = 0; i < 100000; i += 1) {
                lengthF(a1,a2,a3,a4);
             
            }
        },
        name : "length32",
    },{
        func : function (){
            var i;
            for (i = 0; i < 100000; i += 1) {
                len(a1,a2,a3,a4);
             
            }
        },
        name : "lengthNative",
    }
]

测试FireFox上的结果

The test results on FireFox are

 /*
=======================================
Performance test. : WebAssm V Javascript
Use strict....... : true
Data view........ : false
Duplicates....... : 4
Cycles........... : 34
Samples per cycle : 100
Tests per Sample. : undefined
---------------------------------------------
Test : 'length64'
Mean : 26359µs ±128µs (*) 1128 samples
---------------------------------------------
Test : 'length32'
Mean : 27456µs ±109µs (*) 1144 samples
---------------------------------------------
Test : 'lengthNative'
Mean : 106µs ±2µs (*) 1128 samples
-All ----------------------------------------
Mean : 18.018ms Totals time : 61262.240ms 3400 samples
(*) Error rate approximation does not represent the variance.
*/

推荐答案

Andreas描述了一个数字为什么JavaScript实现最初观察到x300更快的充分理由。但是,您的代码还存在许多其他问题。

Andreas describes a number of good reasons why the JavaScript implementation was initially observed to be x300 faster. However, there are a number of other issues with your code.


  1. 这是一个经典的微基准,即您的代码测试非常小,测试循环中的其他开销是一个重要因素。例如,从JavaScript调用WebAssembly会产生开销,这会影响您的结果。你想测量什么?原始处理速度?或语言边界的开销?

  2. 由于测试代码的细微变化,您的结果差异很大,从x300到x2。同样,这是一个微观基准问题。其他人在使用这种方法测量性能时也看到了同样的情况,例如这篇帖子声称wasm是x84更快,这显然是错误的!

  3. 当前的WebAssembly VM非常新,而且是MVP。它会变得更快。您的JavaScript VM已有20年的时间才能达到目前的速度。 JS< => wasm边界的性能正在立即进行工作和优化

  1. This is a classic 'micro benchmark', i.e. the code that you are testing is so small, that the other overheads within your test loop are a significant factor. For example, there is an overhead in calling WebAssembly from JavaScript, which will factor in your results. What are you trying to measure? raw processing speed? or the overhead of the language boundary?
  2. Your results vary wildly, from x300 to x2, due to small changes in your test code. Again, this is a micro benchmark issue. Others have seen the same when using this approach to measure performance, for example this post claims wasm is x84 faster, which is clearly wrong!
  3. The current WebAssembly VM is very new, and an MVP. It will get faster. Your JavaScript VM has had 20 years to reach its current speed. The performance of the JS <=> wasm boundary is being worked on and optimised right now.

有关更明确的答案,请参阅WebAssembly团队的联合文件,其中概述了预期的运行时性能提升约30%

For a more definitive answer, see the joint paper from the WebAssembly team, which outlines an expected runtime performance gain of around 30%

最后,回答你的观点:


如果没有优化,那么WebAssembly的重点是什么

Whats the point of WebAssembly if it does not optimise

我认为您对WebAssembly将为您做什么有误解。基于上面的论文,运行时性能优化是非常适度的。但是,仍有许多性能优势:

I think you have misconceptions around what WebAssembly will do for you. Based on the paper above, the runtime performance optimisations are quite modest. However, there are still a number of performance advantages:


  1. 其紧凑的二进制格式意味着低级别性质意味着浏览器可以加载,解析和编译代码比JavaScript快得多。预计WebAssembly的编译速度可能比浏览器下载的速度快。

  2. WebAssembly具有可预测的运行时性能。使用JavaScript,性能通常随着每次迭代而增加,因为它进一步优化。它也可能因se优化而减少。

此外,还有一些与性能无关的优势。

There are also a number of non-performance related advantages too.

要获得更真实的绩效衡量,请查看:

For a more realistic performance measurement, take a look at:

  • Its use within Figma
  • Results from using it with PDFKit

两者都是实用的生产代码库。

Both are practical, production codebases.

这篇关于为什么webAssembly的功能比同样的JS函数快300倍的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆