为什么我的 WebAssembly 函数比等效的 JavaScript 函数慢? [英] Why is my WebAssembly function slower than the JavaScript equivalent?

查看:38
本文介绍了为什么我的 WebAssembly 函数比等效的 JavaScript 函数慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于这么宽泛的问题深表歉意!我正在学习 WASM 并在 C 中创建了一个 Mandelbrot 算法:

int iterateEquation(float x0, float y0, int maxiterations) {浮点 a = 0, b = 0, rx = 0, ry = 0;整数迭代 = 0;而(迭代<最大值&&(rx * rx + ry * ry <= 4.0)){rx = a * a - b * b + x0;ry = 2.0 * a * b + y0;a = rx;b = ry;迭代++;}返回迭代;}void mandelbrot(int *buf,浮动宽度,浮动高度){for(float x = 0.0; x <宽度; x++) {for(float y = 0.0; y <高度; y++) {//映射到mandelbrot坐标浮动 cx = (x - 150.0)/100.0;浮动 cy = (y - 75.0)/100.0;int 迭代 = iterateEquation(cx, cy, 1000);int loc = ((x + y * 宽度) * 4);//设置红色和 alpha 分量*(buf + loc) = 迭代次数 >100 ?255:0;*(buf + (loc+3)) = 255;}}}

我正在按如下方式编译为 WASM(为清楚起见,省略了文件名输入/输出)

clang -emit-llvm -O3 --target=wasm32 ...llc -march=wasm32 -filetype=asm ...s2wasm --initial-memory 6553600 ...wat2wasm ...

我在 JavaScript 中加载、编译,然后调用如下:

instance.exports.mandelbrot(0, 300, 150)

输出被复制到画布上,这使我能够验证它是否正确执行.在我的电脑上,上述函数需要大约 120 毫秒才能执行.

然而,这里有一个 JavaScript 等价物:

const iterateEquation = (x0, y0, maxiterations) =>{让 a = 0, b = 0, rx = 0, ry = 0;让迭代= 0;而(迭代<最大值&&(rx * rx + ry * ry <= 4)){rx = a * a - b * b + x0;ry = 2 * a * b + y0;a = rx;b = ry;迭代++;}返回迭代;}const mandelbrot = (数据) =>{for (var x = 0; x <300; x++) {for (var y = 0; y <150; y++) {const cx = (x - 150)/100;const cy = (y - 75)/100;const res = iterateEquation(cx, cy, 1000);const idx = (x + y * 300) * 4;数据[idx] = res >100 ?255:0;数据[idx+3] = 255;}}}

执行只需要大约 62 毫秒.

现在我知道 WebAssembly 是很新的,而且还没有得到很好的优化.但我不禁觉得它应该比这更快!

有人能发现我可能遗漏的明显内容吗?

此外,我的 C 代码直接写入从0"开始的内存 - 我想知道这是否安全?分页线性存储器中存储的堆栈在哪里?我会冒险覆盖它吗?

这里有一个小提琴来说明:

https://wasdk.github.io/WasmFiddle/?jvoh5>

运行时,它会记录两个等效实现(WASM 然后是 JavaScript)的时间

解决方案

一般

与优化的 JS 相比,通常你可以希望在繁重的数学运算上获得约 10% 的提升.这包括:

  • 利润微薄
  • 输入/输出内存复制开销.

注意,Uint8Array 复制在 chrome 中特别慢(在 FF 中还可以).当您处理 rgba 数据时,最好将底层缓冲区重新转换为 Uint32Array ant 在其上使用 .set().

在 wasm 中尝试按字 (rgba) 读/写像素的工作速度与读/写字节 (r, g, b, a) 的速度相同.我没有发现差异.

当使用 node.js 进行开发时(就像我一样),对于 JS 基准测试,保持 8.2.1 是值得的.下一个版本将 v8 升级到 v6.0,并为此类数学引入了严重的速度回归.对于 8.2.1 - 不要使用现代 ES6 特性,如 const=> 等.请改用 ES5.v8 v6.2 的下一个版本可能会解决这些问题.

示例评论

  1. 使用 wasm-opt -O3,在 clang -O3 之后的某个时间可能会有所帮助.
  2. 使用 s2wasm --import-memory 而不是硬编码固定内存大小
  3. 在 wasdk 站点的代码中,不要使用全局变量.当这些存在时,编译器会在内存开始时为全局变量分配未知块,您可以错误地覆盖它们.
  4. 可能,正确的代码应该从正确的位置添加内存副本,并且应该包含在基准测试中.您的示例不完整,并且来自 wasdk 的恕我直言代码应该无法正常工作.
  5. 使用benchmark.js,这样更精确.

<小时>

简而言之:在继续之前,值得清理一下.

您可能会发现挖掘 https://github.com/nodeca/multimath 来源很有用,或在您的实验中使用它.我专门为小型 CPU 密集型事物创建了它,以通过适当的模块初始化、内存管理、js 回退等来简化问题.它包含unsharp mask"实现作为示例和基准.在那里采用您的代码应该不难.

Apologies for the broad question! I'm learning WASM and have created a Mandelbrot algorithm in C:

int iterateEquation(float x0, float y0, int maxiterations) {
  float a = 0, b = 0, rx = 0, ry = 0;
  int iterations = 0;
  while (iterations < maxiterations && (rx * rx + ry * ry <= 4.0)) {
    rx = a * a - b * b + x0;
    ry = 2.0 * a * b + y0;
    a = rx;
    b = ry;
    iterations++;
  }
  return iterations;
}

void mandelbrot(int *buf, float width, float height) {
  for(float x = 0.0; x < width; x++) {
    for(float y = 0.0; y < height; y++) {
      // map to mandelbrot coordinates
      float cx = (x - 150.0) / 100.0;
      float cy = (y - 75.0) / 100.0;
      int iterations = iterateEquation(cx, cy, 1000);
      int loc = ((x + y * width) * 4);
      // set the red and alpha components
      *(buf + loc) = iterations > 100 ? 255 : 0;
      *(buf + (loc+3)) = 255;
    }
  }
}

I'm compiling to WASM as follows (filename input / output omitted for clarity)

clang -emit-llvm  -O3 --target=wasm32 ...
llc -march=wasm32 -filetype=asm ...
s2wasm --initial-memory 6553600 ...
wat2wasm ... 

I'm loading in JavaScript, compiling, then invoking as follows:

instance.exports.mandelbrot(0, 300, 150)

The output is being copied to a canvas, which enables me to verify that it is executed correctly. On my computer the above function takes around 120ms to execute.

However, here's a JavaScript equivalent:

const iterateEquation = (x0, y0, maxiterations) => {
  let a = 0, b = 0, rx = 0, ry = 0;
  let iterations = 0;
  while (iterations < maxiterations && (rx * rx + ry * ry <= 4)) {
    rx = a * a - b * b + x0;
    ry = 2 * a * b + y0;
    a = rx;
    b = ry;
    iterations++;
  }
  return iterations;
}

const mandelbrot = (data) => {
  for (var x = 0; x < 300; x++) {
    for (var y = 0; y < 150; y++) {
      const cx = (x - 150) / 100;
      const cy = (y - 75) / 100;
      const res = iterateEquation(cx, cy, 1000);
      const idx = (x + y * 300) * 4;
      data[idx] = res > 100 ? 255 : 0;
      data[idx+3] = 255;
    }
  }
}

Which only takes ~62ms to execute.

Now I know WebAssembly is very new, and is not terribly optimised. But I can't help feeling that it should be faster than this!

Can anyone spot something obvious I might have missed?

Also, my C code writes directly to memory starting at '0' - I am wondering if this is safe? Where is the stack stored in the paged linear memory? Am I going to risk overwriting it?

Here's a fiddle to illustrate:

https://wasdk.github.io/WasmFiddle/?jvoh5

When run, it logs the timings of the two equivalent implementations (WASM then JavaScript)

解决方案

General

Usually you can hope to get ~10% boost on heavy math, compared to optimized JS. That consists of:

  • wasm profit
  • in/out memory copy expences.

Note, Uint8Array copy is notably slow in chrome (ok in FF). When you work with rgba data, it's better to recast underlying buffers to Uint32Array ant use .set() on it.

Attempt to read/write pixels by word (rgba) in wasm works with the same speed as read/write bytes (r, g, b, a). I did not found difference.

When use node.js for development (as i do), it worth to stay on 8.2.1 for JS benchmarks. Next version upgraded v8 to v6.0 and introduced serious speed regressions for such math. For 8.2.1 - don't use modern ES6 features like const, => and so on. Use ES5 instead. May be next version with v8 v6.2 will fix those issues.

Samples comments

  1. Use wasm-opt -O3, that may help sometime after clang -O3.
  2. Use s2wasm --import-memory instead of hardcoding fixed memory size
  3. In code at wasdk site, do NOT use global vars. When those present, compiler will allocate unknown block at memory start for globals, and you can override those by mistake.
  4. Probably, correct code should add memory copy from proper location, and that should be included into benchmark. Your samples are not complete, and IMHO code from wasdk should not work right.
  5. Use benchmark.js, that's more precise.


In short: prior to continue, it worth to cleanup things.

You may find useful to dig https://github.com/nodeca/multimath sources, or use it in your experiments. I created it specially for small CPU intensive things, to simplify issues with proper modules init, memory management, js fallbacks and so on. It contains 'unsharp mask' implementation as example and benchmarks. It should not be difficult to adopt your code there.

这篇关于为什么我的 WebAssembly 函数比等效的 JavaScript 函数慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆