与Mongo的$ first组运算符一起使用索引 [英] Using an Index with Mongo's $first Group Operator

查看:758
本文介绍了与Mongo的$ first组运算符一起使用索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

每个Mongo最新的 $ group文档,对$ first进行了特殊的优化:

Per Mongo's latest $group documentation, there is a special optimization for $first:

优化以返回每个组的第一个文档

如果管道按同一字段排序和分组,并且$ group阶段仅使用$ first累加器运算符,请考虑在分组字段上添加与排序顺序匹配的索引.在某些情况下,$ group阶段可以使用索引来快速找到每个组的第一个文档.

If a pipeline sorts and groups by the same field and the $group stage only uses the $first accumulator operator, consider adding an index on the grouped field which matches the sort order. In some cases, the $group stage can use the index to quickly find the first document of each group.

这是有道理的,因为$ group阶段中的每个bin只需要有序索引中的第一个条目.不幸的是,在我的测试中,我得到了一个查询,该查询在大约1s的时间内呈现了约80万条排序的记录,然后将它们传递给$ group,在其中需要花费大约10s的时间来呈现一些key值的1.7k输出文档(参见下面的示例).对于其他key值,其超时时间为300s.不管key是什么,该组中应该确切有1704个bin,并且这些查询bin应该由索引的前三个条目覆盖,正如我所知.我想念什么吗?

It makes sense, since only the first entry in an ordered index should be needed for each bin in the $group stage. Unfortunately, in my testing, I've gotten a query that renders ~800k sorted records in about 1s, then passes them to $group, where it takes about 10s to render the 1.7k output docs for some values of key (see example below). For other values of key, it times out at 300s. There should be exactly 1704 bins in the group regardless of key, and those query bins should be covered by the first three entries in the index, as near as I can tell. Am I missing something?

db.getCollection('time_series').aggregate([
    {
        '$match': {
            'organization_id': 1,
            'key': 'waffle_count'
        }
    },
    {
        '$sort': {
            'key': 1, 'asset_id': 1, 'date_time': - 1
        }
    },
    {
        '$group': {
            '_id': {
                'key': '$key', 'asset_id': '$asset_id'
            },
            'value': {
                '$first': '$value'
            }
        }
    }
]);

这里是索引:

{
    "organization_id": 1,
    "key": 1,
    "asset_id": 1,
    "date_time": -1
}

推荐答案

我向Atlas的MongoDB支持小组发送了一个请求.我引用的优化直到版本4.2(我们正在使用3.6)才可用.引用Atlas支持:

I sent a request to Atlas's MongoDB Support. The optimization that I quoted isn't available until version 4.2 (we are using 3.6). Quoting Atlas Support:

您提到的增强功能是通过 SERVER-9507 在4.2中实现的.对于您的特定示例,似乎您可能还需要 SERVER-40090 才能在其中实现为了使您的管道能够充分利用改进的优势.我们将让团队知道它对您的特定情况的潜在好处.

The enhancement that you're mentioning was implemented in 4.2 via SERVER-9507. For your particular example, it seems you may also need SERVER-40090 to be implemented in order for your pipeline to fully take advantage of the improvement. We will let the team know of its potential benefit for your specific situation.

到目前为止,第二个问题尚未解决,需要一个简单的$ group _id设置,例如:

As of now, the second issue is not fixed and requires a simple $group _id setup like:

'_id': 'asset_id': '$asset_id'

指定为对象的键即使使用的不是复合键,也无法使用索引,如下所示:

Whereas a key specified as an object will fail to use the index, even if it is not a composite key, like so:

'_id': { 'asset_id': '$asset_id' }

这篇关于与Mongo的$ first组运算符一起使用索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆