MongoDB在数组中的子文档上聚合 [英] Mongodb aggregate on subdocument in array

查看:441
本文介绍了MongoDB在数组中的子文档上聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用mongodb作为后端来实现一个小型应用程序.在此应用程序中,我有一个数据结构,其中文档将包含一个包含子文档数组的字段.

I am implementing a small application using mongodb as a backend. In this application I have a data structure where the documents will contain a field that contains an array of subdocuments.

我使用以下用例作为基础: http://docs.mongodb.org/manual/use-cases/inventory-管理/

I use the following use case as a basis: http://docs.mongodb.org/manual/use-cases/inventory-management/

从示例中可以看到,每个文档都有一个称为carted的字段,该字段是子文档的数组.

As you can see from the example, each document have a field called carted, which is an array of subdocuments.

{
    _id: 42,
    last_modified: ISODate("2012-03-09T20:55:36Z"),
    status: 'active',
    items: [
        { sku: '00e8da9b', qty: 1, item_details: {...} },
        { sku: '0ab42f88', qty: 4, item_details: {...} }
    ]
}

这非常适合我,除了一个问题: 我想对整个集合中的每个唯一项(以"sku"作为唯一标识符键)进行计数,其中每个文档将计数加1(同一文档中同一"sku"的多个实例仍将仅计数1).例如.我想要这个结果:

This fits me perfect, except for one problem: I want to count each unique item (with "sku" as the unique identifier key) in the entire collection where each document adds the count by 1 (multiple instances of the same "sku" in the same document will still just count 1). E.g. I would like this result:

{sku:'00e8da9b',doc_count:1}, {sku:'0ab42f88',doc_count:9}

{ sku: '00e8da9b', doc_count: 1 }, { sku: '0ab42f88', doc_count: 9 }

在阅读了MongoDB之后,当您拥有如上所述的复杂架构时,我对如何(快速)执行此操作感到非常困惑.如果我正确理解了本来很好的文档,那么也许可以使用聚合框架或map/reduce框架来实现此操作,但这是我需要一些输入的地方:

After reading up on MongoDB, I am quite confused about how to do this (fast) when you have a complex schema as described above. If I have understood the otherwise excellent documentation correct, such operation may perhaps be achieved using either the aggregation framework or the map/reduce framework, but this is where I need some input:

  • 鉴于结构的复杂性,哪种框架最适合实现我想要的结果?
  • 为了从所选框架中获得最佳性能,将首选哪种索引?

推荐答案

MapReduce速度较慢,但​​可以处理非常大的数据集.另一方面,Aggregation框架要快一些,但要处理大量数据时会遇到困难.

MapReduce is slow, but it can handle very large data sets. The Aggregation framework on the other hand is a little quicker, but will struggle with large data volumes.

所示结构的麻烦在于,您需要"$ unwind"阵列以破解数据.这意味着为每个数组项创建一个新文档,并需要使用聚合框架在内存中执行此操作.因此,如果您有1000个文档和100个数组元素,则需要构建100,000个文档流以对groupBy进行计数.

The trouble with your structure shown is that you need to "$unwind" the arrays to crack open the data. This means creating a new document for every array item and with the aggregation framework it needs to do this in memory. So if you have 1000 documents with 100 array elements it will need to build a stream of 100,000 documents in order to groupBy and count them.

您可能想考虑是否有一种架构布局可以更好地处理您的查询,但是如果您想使用Aggregation框架来做到这一点,可以采用以下方法(使用一些示例数据,以便将整个脚本放入外壳);

You might want to consider seeing if there's a schema layout that will server your queries better, but if you want to do it with the Aggregation framework here's how you could do it (with some sample data so the whole script will drop into the shell);

db.so.remove();
db.so.ensureIndex({ "items.sku": 1}, {unique:false});
db.so.insert([
    {
        _id: 42,
        last_modified: ISODate("2012-03-09T20:55:36Z"),
        status: 'active',
        items: [
            { sku: '00e8da9b', qty: 1, item_details: {} },
            { sku: '0ab42f88', qty: 4, item_details: {} },
            { sku: '0ab42f88', qty: 4, item_details: {} },
            { sku: '0ab42f88', qty: 4, item_details: {} },
    ]
    },
    {
        _id: 43,
        last_modified: ISODate("2012-03-09T20:55:36Z"),
        status: 'active',
        items: [
            { sku: '00e8da9b', qty: 1, item_details: {} },
            { sku: '0ab42f88', qty: 4, item_details: {} },
        ]
    },
]);


db.so.runCommand("aggregate", {
    pipeline: [
        {   // optional filter to exclude inactive elements - can be removed    
            // you'll want an index on this if you use it too
            $match: { status: "active" }
        },
        // unwind creates a doc for every array element
        { $unwind: "$items" },
        {
            $group: {
                // group by unique SKU, but you only wanted to count a SKU once per doc id
                _id: { _id: "$_id", sku: "$items.sku" },
            }
        },
        {
            $group: {
                // group by unique SKU, and count them
                _id: { sku:"$_id.sku" },
                doc_count: { $sum: 1 },
            }
        }
    ]
    //,explain:true
})

请注意,我已经$ group两次,因为您说一个SKU每个文档只能计数一次,因此我们需要首先整理出唯一的doc/sku对,然后对它们进行计数.

Note that I've $group'd twice, because you said that an SKU can only count once per document, so we need to first sort out the unique doc/sku pairs and then count them up.

如果您希望输出有所不同(换句话说,就像您的示例中一样),我们可以对它们进行$ project.

If you want the output a little different (in other words, EXACTLY like in your sample) we can $project them.

这篇关于MongoDB在数组中的子文档上聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆