将大量文档写入Firestore的最快方法是什么? [英] What is the fastest way to write a lot of documents to Firestore?

查看:119
本文介绍了将大量文档写入Firestore的最快方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要向Firestore写大量文档.

I need to write a large number of documents to Firestore.

在Node.js中最快的方法是什么?

What is the fastest way to do this in Node.js?

推荐答案

TL; DR:在Firestore上执行批量日期创建的最快方法是执行并行的单独写入操作.

将1000个文档写入Firestore需要:

Writing 1,000 documents to Firestore takes:

  1. ~105.4s使用顺序的单独写入操作时
  2. ~ 2.8s使用(2)批处理写操作时
  3. ~ 1.5s使用并行的单独写入操作时
  1. ~105.4s when using sequential individual write operations
  2. ~ 2.8s when using (2) batched write operations
  3. ~ 1.5s when using parallel individual write operations


有三种常见的方法可以在Firestore上执行大量的写操作.


There are three common ways to perform a large number of write operations on Firestore.

  1. 依次执行每个单独的写操作.
  2. 使用批处理写操作.
  3. 并行执行单个写操作.

我们将在下面使用随机文档数据数组依次调查每个人.

We'll investigate each in turn below, using an array of randomized document data.

这是最简单的解决方案:

This is the simplest possible solution:

async function testSequentialIndividualWrites(datas) {
  while (datas.length) {
    await collection.add(datas.shift());
  }
}

我们依次编写每个文档,直到我们编写了每个文档.然后,我们等待每个写操作完成,然后再开始下一个操作.

We write each document in turn, until we've written every document. And we wait for each write operation to complete before starting on the next one.

使用这种方法写入1,000个文档大约需要105秒,因此吞吐量大约每秒 10个文档写入.

Writing 1,000 documents takes about 105 seconds with this approach, so the throughput is roughly 10 document writes per second.

这是最复杂的解决方案.

This is the most complex solution.

async function testBatchedWrites(datas) {
  let batch = admin.firestore().batch();
  let count = 0;
  while (datas.length) {
    batch.set(collection.doc(Math.random().toString(36).substring(2, 15)), datas.shift());
    if (++count >= 500 || !datas.length) {
      await batch.commit();
      batch = admin.firestore().batch();
      count = 0;
    }
  }
}

您可以看到我们通过调用batch()创建了一个BatchedWrite对象,填充该对象直到其最大容量为500个文档,然后将其写入Firestore.我们给每个文档一个生成的名称,该名称相对来说可能是唯一的(对于此测试而言足够好).

You can see that we create a BatchedWrite object by calling batch(), fill that until its maximum capacity of 500 documents, and then write it to Firestore. We give each document a generated name that is relatively likely to be unique (good enough for this test).

使用这种方法写入1,000个文档大约需要2.8秒,因此吞吐量大约每秒 357个文档写入.

Writing 1,000 document takes about 2.8 seconds with this approach, so the throughput is roughly 357 document writes per second.

这比顺序进行单个写入要快得多.实际上:许多开发人员之所以使用这种方法是因为他们认为这是最快的方法,但是正如上面的结果所示,这是不正确的.由于批次的大小限制,代码是迄今为止最复杂的.

That's quite a bit faster than with the sequential individual writes. In fact: many developers use this approach because they assume it is fastest, but as the results above already showed this is not true. And the code is by far the most complex, due to the size constraint on batches.

Firestore文档说明了有关添加大量内容的性能数据:

The Firestore documentation says this about the performance for adding lots of data:

对于批量数据输入,请使用具有并行并行写入操作的服务器客户端库.批处理写入的性能要好于串行写入,但不比并行写入好.

For bulk data entry, use a server client library with parallelized individual writes. Batched writes perform better than serialized writes but not better than parallel writes.

我们可以使用以下代码对其进行测试:

We can put that to the test with this code:

async function testParallelIndividualWrites(datas) {
  await Promise.all(datas.map((data) => collection.add(data)));
}

此代码以最快的速度启动add操作,然后使用Promise.all()等待它们全部完成.通过这种方法,操作可以并行运行.

This code kicks of the add operations as fast as it can, and then uses Promise.all() to wait until they're all finished. With this approach the operations can run in parallel.

使用这种方法写入1,000个文档大约需要1.5秒,因此吞吐量大约为每秒667个文档写入.

Writing 1,000 document takes about 1.5 seconds with this approach, so the throughput is roughly 667 document writes per second.

两者的区别不如前两种方法大,但仍比批量写入快1.8倍以上.

The difference isn't nearly as great as between the first two approaches, but it still is over 1.8 times faster than batched writes.

一些注意事项:

  • 您可以在 Github 上找到该测试的完整代码.
  • 虽然测试是使用Node.js完成的,但您可能会在Admin SDK支持的所有平台上获得相似的结果.
  • 不过,请勿使用客户端SDK执行批量插入,因为结果可能会大不相同,并且可预测性要差得多.
  • 通常,实际性能取决于您的计算机,Internet连接的带宽和延迟以及许多其他因素.基于这些,您可能也会看到差异,尽管我希望顺序保持不变.
  • 如果您自己的测试中有异常值,或者发现完全不同的结果,请在下面留下评论.
  • 批处理写入是原子的.因此,如果您在文档之间有依赖关系,并且必须编写所有文档,或者都不写任何文档,则应该使用批处理写入.
  • You can find the full code of this test on Github.
  • While the test was done with Node.js, you're likely to get similar results across all platforms that the Admin SDK supports.
  • Don't perform bulk inserts using client SDKs though, as the results may be very different and much less predictable.
  • As usual the actual performance depends on your machine, the bandwidth and latency of your internet connection, and many other factors. Based on those you may see differences in the differences too, although I expect the ordering to remain the same.
  • If you have any outliers in your own tests, or find completely different results, leave a comment below.
  • Batches writes are atomic. So if you have dependencies between the documents and all documents must be written, or none of them must be written, you should use a batched write.

这篇关于将大量文档写入Firestore的最快方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆