Mongo聚合光标&数数 [英] Mongo aggregation cursor & counting

查看:75
本文介绍了Mongo聚合光标&数数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据 mongodb节点驱动程序文档现在的聚合函数返回一个光标(从2.6开始)。

According to the mongodb node driver docs the aggregate function now returns a cursor (from 2.6).

我希望我可以使用它来获得预先限制的项目数量。跳过但是在创建的光标上似乎没有任何计数功能。如果我在mongo shell中运行相同的查询,则游标有一个itcount函数,我可以调用它来获取我想要的东西。

I hoped that I could use this to get a count of items pre limit & skipping but there doesn't seem to be any count function on the created cursor. If I run the same queries in the mongo shell the cursor has an itcount function that I can call to get what I want.

我看到创建的游标有一个打开数据事件(这是否意味着它是一个 CursorStream ?)似乎被触发预期的次数,但是如果我将它与cursor.get结合使用,则没有结果传递给回调函数。

I saw that the created cursor has an on data event (does that mean it's a CursorStream?) which seemed to get triggered the expected number of times, but if I use it in combination with cursor.get no results get passed into the callback function.

可以使用新的光标功能来计算聚合查询?

Can the new cursor feature be used to count an aggregation query?

编辑代码:

在mongo shell中:

In mongo shell:

> db.SentMessages.find({Type : 'Foo'})
{ "_id" : ObjectId("53ea19af9834184ad6d3675a"), "Name" : "123", "Type" : "Foo" }
{ "_id" : ObjectId("53ea19dd9834184ad6d3675c"), "Name" : "789", "Type" : "Foo" }
{ "_id" : ObjectId("53ea19d29834184ad6d3675b"), "Name" : "456", "Type" : "Foo" }

> db.SentMessages.find({Type : 'Foo'}).count()
3

> db.SentMessages.find({Type : 'Foo'}).limit(1)
{ "_id" : ObjectId("53ea19af9834184ad6d3675a"), "Name" : "123", "Type" : "Foo" }

> db.SentMessages.find({Type : 'Foo'}).limit(1).count();
3

> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}} ])
{ "_id" : ObjectId("53ea19af9834184ad6d3675a"), "Name" : "123", "Type" : "Foo" }
{ "_id" : ObjectId("53ea19dd9834184ad6d3675c"), "Name" : "789", "Type" : "Foo" }
{ "_id" : ObjectId("53ea19d29834184ad6d3675b"), "Name" : "456", "Type" : "Foo" }

> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}} ]).count()
2014-08-12T14:47:12.488+0100 TypeError: Object #<Object> has no method 'count'

> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}} ]).itcount()
3

> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}}, {$limit : 1} ])
{ "_id" : ObjectId("53ea19af9834184ad6d3675a"), "Name" : "123", "Type" : "Foo" }

> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}}, {$limit : 1} ]).itcount()
1

> exit
bye

在节点中:

var cursor = collection.aggregate([ { $match : { Type : 'Foo'}}, {$limit : 1} ], { cursor : {}});

cursor.get(function(err, res){
  // res is as expected (1 doc)
});

cursor.count()不存在

cursor.count() does not exist

cursor.itcount()不存在

cursor.itcount() does not exist

存在on数据事件:

cursor.on('data', function(){
    totalItems++;
});

但与cursor.get结合使用时,.get回调函数现在包含0个文档

but when used in combination with cursor.get, the .get callback function now contains 0 docs

编辑2:返回的光标似乎是聚合游标而不是文档中列出的游标之一

Edit 2: The cursor returned appears to be an aggregation cursor rather than one of the cursors listed in the docs

推荐答案

这可能是对于那些可能会搜索这个的人来说应该得到一个完整的解释,所以为后代添加一个。

This possibly deserves a full explanation for those who might search for this, so adding one for posterity.

具体来说,返回的是node.js的一个事件流,它有效地包装了< a href =http://nodejs.org/api/stream.html#stream_class_stream_readable =noreferrer> stream.Readable 界面,带有几种便捷方法。 .count()目前不是其中之一,并且考虑到当前使用的界面没有多大意义。

Specifically what is returned is an Event Stream for node.js which effectively wraps the stream.Readable interface with a couple of convenience methods. A .count() is not one of them at present and considering the current interface used would not make much sense.

<$ c $返回的结果类似c> .stream 方法可用于游标对象,当你考虑实现时,count在这里没有多大意义,因为它意味着要处理为流最终你会到达一个结束但是只是想要处理直到到达那里。

Similar to the result returned from the .stream() method available to cursor objects, a "count" would not make much sense here when you consider the implementation, as it is meant to process as a "stream" where eventually you are going to reach an "end" but otherwise just want to process until getting there.

如果你考虑了驱动程序的标准光标界面,那里聚合游标不一样的一些可靠原因:

If you considered the standard "Cursor" interface from the driver, there are some solid reasons why the aggregation cursor is not the same:


  1. 游标允许在执行前处理修饰符操作。这些属于 .sort() .limit() .skip的类别()。所有这些实际上都在汇编框架中具有在管道中指定的对应指令。由于管道阶段可能出现在任何地方,而不仅仅是简单查询的后处理选项,因此提供相同的光标处理没有多大意义。

  1. Cursors allow "modifier" actions to be processed prior to execution. These fall into the categories of .sort(), .limit() and .skip(). All of these actually have counterpart directives in the aggregation framework that are specified in the pipeline. As pipeline stages that could appear "anywhere" and not just as a post-processing option to a simple query, this would not make much sense to offer the same "cursor" processing.

其他光标修饰符包括 .hint() .min()和<$等特殊内容c $ c> .max()这是对索引选择和处理的改动。虽然这些可用于聚合管道,但目前没有简单的方法将它们包括在查询选择中。大多数情况下,前一点的逻辑覆盖了为光标使用相同类型的接口的任何要点。

Other cursor modifiers include specials like .hint(), .min() and .max() which are alterations to "index selection" and processing. Whilst these could be of use to the aggregation pipeline, there is currently no simple way to include these in query selection. Mostly the logic from the previous point overrides any point of using the same type of interface for a "Cursor".

其他考虑因素是您实际想要使用游标以及为什么想要返回游标。由于光标通常是单向行程,因为它们通常只在到达终点并处于可用的批次中时才被处理,因此它得出一个合理的结论,即计数实际上是在最后,事实上,队列最终耗尽。

The other considerations are what you actually want to do with a cursor and why you "want" one returned. Since a cursor is usually a "one way trip" in the sense that they are usually only processed until an end is reached and in usable "batches", then it makes a reasonable conclusion that the "count" just actually comes at the end, when in fact that "queue" is finally depleted.

虽然事实上标准的光标实现确实存在一些技巧,但主要原因是这只是扩展元数据概念,因为查询分析引擎必须扫描一定数量的文档,以确定在结果中返回哪些项目。

While it is true that in fact the standard "cursor" implementation holds some tricks, the main reason is that this just extends a "meta" data concept as the query profiling engine must "scan" a certain number of document in order to determine which items to return in the result.

聚合框架虽然有点用这个概念。由于不仅存在与通过标准查询分析器处理的结果相同的结果,而且还有其他阶段。任何这些阶段都有可能修改在流中实际返回的结果计数。

The aggregation framework plays with this concept a little though. Since not only are there the same results as would be processed through the standard query profiler, but also there are additional stages. Any of these stages has the potential to "modify" the resulting "count" that would actually be returned in the "stream" to be processed.

再次,如果你想要的话从学术角度来看这个并说当然,查询引擎应该保留计数的'元数据',但是我们不能跟踪之后修改的内容吗?。这将是一个公平的论点,以及管道运营商,如 $ match $ group $ unwind 甚至可能包括 $ project 和新的 $ redact ,所有这些都可以被认为是在每个管道阶段保持自己跟踪已处理文档并在元数据中更新可能返回到解释的合理情况在完整的管道结果计数中。

Again, if you want to look at this from an academic point of view and say that "Sure, the query engine should keep the 'meta data' for the count, but can we not track what is modified after?". This would be a fair argument, and pipeline operators such as $match and $group or $unwind and possibly even including $project and the new $redact, all could be considered a reasonable case for keeping their own track of the "documents processed" in each pipeline stage and update that in the "meta data" that could possibly be returned to explain the full pipeline result count.

最后一个参数是合理的,但也要考虑到目前聚合管道结果的光标概念的实现是MongoDB的新概念。可以相当地认为,在第一个设计点的所有合理期望可能是组合文件的大多数结果的大小不会限制BSON限制。但随着使用量的增加,感知会发生变化,情况会发生变化以适应。

The last argument is reasonable, but consider also that at the present time the implementation of a "Cursor" concept for the aggregation pipeline results is a new concept for MongoDB. It could be fairly argued that all "reasonable" expectations at the first design point would have been that "most" results from combining documents would not be of a size that was restrictive to the BSON limitations. But as usage expands then perceptions are altered and things change to adapt.

所以这可能会改变,但不是当前实施的方式。虽然标准游标实现上的 .count()可以访问记录扫描号码的元数据,但当前实现的任何方法都会导致检索全部就像 .itcount()在shell中一样产生游标。

So this "could" possibly be changed, but it is not how it is "currently" implemented. While .count() on a standard cursor implementation has access to the "meta data" where the scanned number is recorded, any method on the current implementation would result in retrieving all of the cursor results, just as .itcount() does in the shell.

处理游标项目依靠数据事件并在最后发出一些东西(可能是JSON流生成器)作为计数。对于任何需要计数预先的用例,无论如何它似乎都不是游标的有效用途,因为输出肯定是一个合理大小的整个文档。

Process the "cursor" items by counting on the "data" event and emitting something ( possibly a JSON stream generator ) as the "count" at the end. For any use case that would require a count "up-front" it would not seem like a valid use for a cursor anyway, as surely the output would be a whole document of a reasonable size.

这篇关于Mongo聚合光标&amp;数数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆