将大块复制到工人上是否昂贵? [英] Is copying a large blob over to a worker expensive?

查看:61
本文介绍了将大块复制到工人上是否昂贵?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Fetch API,我可以向网络请求大量二进制数据资产(例如,超过500 MB),然后将 Response 转换为 Blob ArrayBuffer .

此后,我可以执行 worker.postMessage 并让标准结构化克隆算法将 Blob 复制到Web Worker或传输 ArrayBuffer 转移到工作程序上下文(有效地不再从主线程使用).

首先,似乎最好将数据作为 ArrayBuffer 来获取,因为 Blob 是不可传输的,因此需要复制过来.但是,blob是不可变的,因此,浏览器似乎没有将其存储在与页面关联的JS堆中,而是存储在专用的blob存储空间中,因此最终将其复制到工作程序上下文中只是参考.

我已经准备了一个演示来尝试两种方法之间的区别:

前两个快照是在使用 postMessage 在工作环境中复制了获取的 Blob 后拍摄的.请注意,这些堆都不包含656 MB.

后两个快照是在我使用 FileReader 实际访问基础数据之后拍摄的,并且正如预期的那样,堆增长了很多.

现在,这就是直接作为 ArrayBuffer 进行获取的情况:

在这里,由于二进制数据只是通过工作线程传输的,所以主线程的堆很小,但工作堆包含656 MB的全部内存,甚至在读取该数据之前.

现在,环顾四周,我看到ArrayBuffer和Blob之间有什么区别?提到了这两种结构之间的许多潜在差异,但是我还没有找到关于是否应该担心通过 Blob <执行上下文之间的/code>与 ArrayBuffer 似乎具有的固有优势是它们可以转移.但是,我的实验表明,复制 Blob 可能实际上更快,因此我认为更好.

似乎每个浏览器供应商都将如何存储和处理 Blob .我发现此Chromium文档,其中描述了所有 Blob 已从每个渲染器进程(即选项卡上的页面)转移到浏览器进程,这样Chrome甚至可以卸载 Blob 如有必要,将其存储到辅助存储器中.

是否有人对此有更多见解?如果我可以选择通过网络获取一些较大的二进制数据并将其移至Web Worker,我应该使用 Blob 还是 ArrayBuffer ?

解决方案

否, postMessage 一个Blob并不昂贵.

Blob的克隆步骤

给定值并进行序列化的序列化步骤为:

  1. 设置序列化.[[SnapshotState]]设置为值的快照状态.

  2. 设置序列化.[[ByteSequence]]为值的基础字节序列.

根据序列号和值,其反序列化步骤为:

  1. 将值的快照状态设置为序列化.[[SnapshotState]].

  2. 将值的基础字节序列设置为序列化.[[ByteSequence]].

换句话说,什么也不会被复制,快照状态和字节序列都通过引用传递(即使包装的JS对象不是).

但是,对于您的整个项目,出于以下两个原因,我不建议您在此处使用Blob:

  1. 获取算法首先在内部作为ArrayBuffer进行获取.请求Blob会在此处添加一个额外的步骤(这会消耗内存).
  2. 您可能需要从Worker中读取该Blob,然后再增加一个步骤(这也将消耗内存,因为此处数据实际上会被复制).

Using the Fetch API I'm able to make a network request for a large asset of binary data (say more than 500 MB) and then convert the Response to either a Blob or an ArrayBuffer.

Afterwards, I can either do worker.postMessage and let the standard structured clone algorithm copy the Blob over to a Web Worker or transfer the ArrayBuffer over to the worker context (making effectively no longer available from the main thread).

At first, it would seem that it would be much preferable to fetch the data as an ArrayBuffer, since a Blob is not transferrable and thus, will need to be copied over. However, blobs are immutable and thus, it seems that the browser doesn't store it in the JS heap associated to the page, but rather in a dedicated blob storage space and thus, what's ended up being copied over to the worker context is just a reference.

I've prepared a demo to try out the difference between the two approaches: https://blobvsab.vercel.app/. I'm fetching 656 MB worth of binary data using both approaches.

Something interesting I've observed in my local tests, is that copying the Blob is even faster than transferring the ArrayBuffer:

Blob copy time from main thread to worker: 1.828125 ms

ArrayBuffer transfer time from main thread to worker: 3.393310546875 ms

This is a strong indicator that dealing with Blobs is actually pretty cheap. Since they're immutable, the browser seems to be smart enough to treat them as a reference rather than linking the overlying binary data to those references.

Here are the heap memory snapshots I've taken when fetching as a Blob:

The first two snapshots were taken after the resulting Blob of fetching was copied over the worker context using postMessage. Notice that neither of those heaps include the 656 MBs.

The latter two snapshots were taken after I've used a FileReader to actually access the underlying data, and as expected, the heap grew a lot.

Now, this is what happens with fetching directly as an ArrayBuffer:

Here, since the binary data was simply transferred over the worker thread, the heap of the main thread is small but the worker heap contains the entirety of the 656 MBs, even before reading this data.

Now, looking around at SO I see that What is the difference between an ArrayBuffer and a Blob? mentions a lot of underlying differences between the two structures, but I haven't found a good reference regarding if one should be worried about copying over a Blob between execution contexts vs. what would seem an inherent advantage of ArrayBuffer that they're transferrable. However, my experiments show that copying the Blob might actually be faster and thus I think preferable.

It seems to be up to each browser vendor how they're storing and handling Blobs. I've found this Chromium documentation describing that all Blobs are transferred from each renderer process (i.e. a page on a tab) to the browser process and that way Chrome can even offload the Blob to the secondary memory if needed.

Does anyone have some more insights regarding all of this? If I can choose to fetch some large binary data over the network and move that to a Web Worker should I prefer a Blob or a ArrayBuffer?

解决方案

No, it's not expensive at all to postMessage a Blob.

The cloning steps of a Blob are

Their serialization steps, given value and serialized, are:

  1. Set serialized.[[SnapshotState]] to value’s snapshot state.

  2. Set serialized.[[ByteSequence]] to value’s underlying byte sequence.

Their deserialization step, given serialized and value, are:

  1. Set value’s snapshot state to serialized.[[SnapshotState]].

  2. Set value’s underlying byte sequence to serialized.[[ByteSequence]].

In other words, nothing is copied, both the snapshot state and the byte sequence are passed by reference, (even though the wrapping JS object is not).

However regarding your full project, I wouldn't advise using Blobs here for two reasons:

  1. The fetch algorithm first fetches as an ArrayBuffer internally. Requesting a Blob adds an extra step there (which consumes memory).
  2. You'll probably need to read that Blob from the Worker, adding yet an other step (which will also consume memory since here the data will actually get copied).

这篇关于将大块复制到工人上是否昂贵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆