避免聚合 16MB 限制 [英] Avoid Aggregate 16MB Limit
问题描述
我收集了大约 100 万份文档.每个文档都有 internalNumber
属性,我需要在我的 node.js 代码中获取所有 internalNumber
.
I have a collection of about 1M documents. Each document has internalNumber
property and I need to get all internalNumber
s in my node.js code.
以前我用的是
db.docs.distinct("internalNumber")
或
collection.distinct('internalNumber', {}, {},(err, result) => { /* ... */ })
在节点中.
但是随着集合的增长我开始得到错误:distinct is too big, 16m cap
.
But with the growth of the collection I started to get the error: distinct is too big, 16m cap
.
现在我想使用聚合.它消耗大量内存并且速度很慢,但是没关系,因为我只需要在脚本启动时执行一次.我在 Robo 3T GUI 工具中尝试过:
Now I want to use aggregation. It consumes a lot of memory and it is slow, but it is OK since I need to do it only once at the script startup. I've tried following in Robo 3T GUI tool:
db.docs.aggregate([{$group: {_id: '$internalNumber'} }]);
它有效,我想通过以下方式在 node.js 代码中使用它:
It works, and I wanted to use it in node.js code the following way:
collection.aggregate([{$group: {_id: '$internalNumber'} }],
(err, docs) => { /* ... * });
但在 Node 中我得到一个错误:MongoError:聚合结果超过 Function.MongoError.create 的最大文档大小 (16MB)"
.
But in Node I get an error: "MongoError: aggregation result exceeds maximum document size (16MB) at Function.MongoError.create"
.
请帮助克服这个限制.
推荐答案
问题是本机驱动程序与默认情况下 shell 方法的工作方式不同,因为shell"实际上是返回一个光标"对象,其中本机驱动程序显式"需要此选项.
The problem is that the native driver differs from how the shell method is working by default in that the "shell" is actually returning a "cursor" object where the native driver needs this option "explicitly".
没有光标",.aggregate()
将单个 BSON 文档作为文档数组返回,因此我们将其变成游标以避免限制:
Without a "cursor", .aggregate()
returns a single BSON document as an array of documents, so we turn it into a cursor to avoid the limitation:
let cursor = collection.aggregate(
[{ "$group": { "_id": "$internalNumber" } }],
{ "cursor": { "batchSize": 500 } }
);
cursor.toArray((err,docs) => {
// work with resuls
});
然后您可以使用常规方法,例如 .toArray()
使结果成为在客户端"上不具有相同限制的 JavaScript 数组,或用于迭代 "光标".
Then you can use regular methods like .toArray()
to make the results a JavaScript array which on the 'client' does not share the same limitations, or other methods for iterating a "cursor".
这篇关于避免聚合 16MB 限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!