避免聚合 16MB 限制 [英] Avoid Aggregate 16MB Limit

查看:20
本文介绍了避免聚合 16MB 限制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我收集了大约 100 万份文档.每个文档都有 internalNumber 属性,我需要在我的 node.js 代码中获取所有 internalNumber.

I have a collection of about 1M documents. Each document has internalNumber property and I need to get all internalNumbers in my node.js code.

以前我用的是

db.docs.distinct("internalNumber")

collection.distinct('internalNumber', {}, {},(err, result) => { /* ... */ })

在节点中.

但是随着集合的增长我开始得到错误:distinct is too big, 16m cap.

But with the growth of the collection I started to get the error: distinct is too big, 16m cap.

现在我想使用聚合.它消耗大量内存并且速度很慢,但是没关系,因为我只需要在脚本启动时执行一次.我在 Robo 3T GUI 工具中尝试过:

Now I want to use aggregation. It consumes a lot of memory and it is slow, but it is OK since I need to do it only once at the script startup. I've tried following in Robo 3T GUI tool:

db.docs.aggregate([{$group: {_id: '$internalNumber'} }]); 

它有效,我想通过以下方式在 node.js 代码中使用它:

It works, and I wanted to use it in node.js code the following way:

collection.aggregate([{$group: {_id: '$internalNumber'} }],
  (err, docs) => { /* ... * });

但在 Node 中我得到一个错误:MongoError:聚合结果超过 Function.MongoError.create 的最大文档大小 (16MB)".

But in Node I get an error: "MongoError: aggregation result exceeds maximum document size (16MB) at Function.MongoError.create".

请帮助克服这个限制.

推荐答案

问题是本机驱动程序与默认情况下 shell 方法的工作方式不同,因为shell"实际上是返回一个光标"对象,其中本机驱动程序显式"需要此选项.

The problem is that the native driver differs from how the shell method is working by default in that the "shell" is actually returning a "cursor" object where the native driver needs this option "explicitly".

没有光标",.aggregate() 将单个 BSON 文档作为文档数组返回,因此我们将其变成游标以避免限制:

Without a "cursor", .aggregate() returns a single BSON document as an array of documents, so we turn it into a cursor to avoid the limitation:

let cursor = collection.aggregate(
  [{ "$group": { "_id": "$internalNumber" } }],
  { "cursor": { "batchSize": 500 } }
);

cursor.toArray((err,docs) => {
   // work with resuls
});

然后您可以使用常规方法,例如 .toArray() 使结果成为在客户端"上不具有相同限制的 JavaScript 数组,或用于迭代 "光标".

Then you can use regular methods like .toArray() to make the results a JavaScript array which on the 'client' does not share the same limitations, or other methods for iterating a "cursor".

这篇关于避免聚合 16MB 限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆