将大量文档写入 Firestore 的最快方法是什么? [英] What is the fastest way to write a lot of documents to Firestore?

查看:19
本文介绍了将大量文档写入 Firestore 的最快方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将大量文档写入 Firestore.

I need to write a large number of documents to Firestore.

在 Node.js 中执行此操作的最快方法是什么?

What is the fastest way to do this in Node.js?

推荐答案

TL;DR:在 Firestore 上执行批量日期创建的最快方法是执行并行的单独写入操作.

将 1,000 个文档写入 Firestore 需要:

Writing 1,000 documents to Firestore takes:

  1. ~105.4s 使用顺序单独写操作时
  2. ~ 2.8s 使用 (2) 批量写入操作时
  3. ~ 1.5s 使用并行单独写操作时
  1. ~105.4s when using sequential individual write operations
  2. ~ 2.8s when using (2) batched write operations
  3. ~ 1.5s when using parallel individual write operations

<小时>

在 Firestore 上执行大量写入操作的常用方法有三种.


There are three common ways to perform a large number of write operations on Firestore.

  1. 按顺序执行每个单独的写入操作.
  2. 使用批量写入操作.
  3. 并行执行单个写入操作.

我们将在下面依次研究每一个,使用一组随机化的文档数据.

We'll investigate each in turn below, using an array of randomized document data.

这是最简单的解决方案:

This is the simplest possible solution:

async function testSequentialIndividualWrites(datas) {
  while (datas.length) {
    await collection.add(datas.shift());
  }
}

我们依次编写每个文档,直到我们编写完所有文档.我们等待每个写操作完成,然后再开始下一个.

We write each document in turn, until we've written every document. And we wait for each write operation to complete before starting on the next one.

使用这种方法编写 1,000 个文档大约需要 105 秒,因此吞吐量大约为每秒写入 10 个文档.

Writing 1,000 documents takes about 105 seconds with this approach, so the throughput is roughly 10 document writes per second.

这是最复杂的解决方案.

This is the most complex solution.

async function testBatchedWrites(datas) {
  let batch = admin.firestore().batch();
  let count = 0;
  while (datas.length) {
    batch.set(collection.doc(Math.random().toString(36).substring(2, 15)), datas.shift());
    if (++count >= 500 || !datas.length) {
      await batch.commit();
      batch = admin.firestore().batch();
      count = 0;
    }
  }
}

您可以看到我们通过调用 batch() 创建了一个 BatchedWrite 对象,将其填充到最大容量为 500 个文档,然后将其写入 Firestore.我们给每个文档一个生成的名字,这个名字比较可能是唯一的(对于这个测试来说已经足够了).

You can see that we create a BatchedWrite object by calling batch(), fill that until its maximum capacity of 500 documents, and then write it to Firestore. We give each document a generated name that is relatively likely to be unique (good enough for this test).

使用这种方法写入 1,000 个文档大约需要 2.8 秒,因此吞吐量大约为每秒写入 357 个文档.

Writing 1,000 document takes about 2.8 seconds with this approach, so the throughput is roughly 357 document writes per second.

这比顺序单独写入要快得多.事实上:许多开发人员使用这种方法是因为他们认为它是最快的,但正如上面的结果已经表明的那样,这是不正确的.由于批处理的大小限制,代码是迄今为止最复杂的.

That's quite a bit faster than with the sequential individual writes. In fact: many developers use this approach because they assume it is fastest, but as the results above already showed this is not true. And the code is by far the most complex, due to the size constraint on batches.

Firestore 文档说这是关于 添加大量数据:

The Firestore documentation says this about the performance for adding lots of data:

对于批量数据输入,请使用具有并行化单独写入的服务器客户端库.批量写入的性能比串行写入好,但不比并行写入好.

For bulk data entry, use a server client library with parallelized individual writes. Batched writes perform better than serialized writes but not better than parallel writes.

我们可以用这个代码来测试:

We can put that to the test with this code:

async function testParallelIndividualWrites(datas) {
  await Promise.all(datas.map((data) => collection.add(data)));
}

这段代码尽可能快地启动 add 操作,然后使用 Promise.all() 等待它们全部完成.通过这种方法,操作可以并行运行.

This code kicks of the add operations as fast as it can, and then uses Promise.all() to wait until they're all finished. With this approach the operations can run in parallel.

使用这种方法写入 1,000 个文档大约需要 1.5 秒,因此吞吐量大约为每秒写入 667 个文档.

Writing 1,000 document takes about 1.5 seconds with this approach, so the throughput is roughly 667 document writes per second.

差异不如前两种方法大,但仍比批量写入快 1.8 倍以上.

The difference isn't nearly as great as between the first two approaches, but it still is over 1.8 times faster than batched writes.

一些注意事项:

  • 您可以在 Github 上找到此测试的完整代码.
  • 虽然测试是使用 Node.js 完成的,但您可能会在 Admin SDK 支持的所有平台上获得相似的结果.
  • 尽管如此,请勿使用客户端 SDK 执行批量插入,因为结果可能会大不相同,而且更难以预测.
  • 与往常一样,实际性能取决于您的机器、互联网连接的带宽和延迟,以及许多其他因素.基于这些,您也可能会看到差异中的差异,但我希望排序保持不变.
  • 如果您在自己的测试中有任何异常值,或发现完全不同的结果,请在下方发表评论.
  • 批量写入是原子的.因此,如果文档之间存在依赖关系,并且必须编写所有文档,或者不必编写任何文档,则应使用批量写入.
  • You can find the full code of this test on Github.
  • While the test was done with Node.js, you're likely to get similar results across all platforms that the Admin SDK supports.
  • Don't perform bulk inserts using client SDKs though, as the results may be very different and much less predictable.
  • As usual the actual performance depends on your machine, the bandwidth and latency of your internet connection, and many other factors. Based on those you may see differences in the differences too, although I expect the ordering to remain the same.
  • If you have any outliers in your own tests, or find completely different results, leave a comment below.
  • Batches writes are atomic. So if you have dependencies between the documents and all documents must be written, or none of them must be written, you should use a batched write.

这篇关于将大量文档写入 Firestore 的最快方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆