MongoDB嵌套对象聚合计数 [英] MongoDB nested object aggregation counting

查看:238
本文介绍了MongoDB嵌套对象聚合计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个高度嵌套的mongoDB对象集,我想计算与给定条件匹配的子文档的数量.(在每个文档中).例如:

I have a highly nested mongoDB set of objects and I want to count the number of subdocuments that match a given condition (in each document). For example:

{"_id":{"chr":"20","pos":"14371","ref":"A","alt":"G"},
"studies":[
    {
        "study_id":"Study1",
        "samples":[
            {
                "sample_id":"NA00001",
                "formatdata":[
                    {"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            },
            {
                "sample_id":"NA00002",
                "formatdata":[
                    {"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            }
        ]
    }
]
}
{"_id":{"chr":"20","pos":"14372","ref":"T","alt":"AA"},
"studies":[
    {
        "study_id":"Study3",
        "samples":[
            {
                "sample_id":"SAMPLE1",
                "formatdata":[
                    {"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            },
            {
                "sample_id":"SAMPLE2",
                "formatdata":[
                    {"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            }
        ]
    }
]
}
{"_id":{"chr":"20","pos":"14373","ref":"C","alt":"A"},
"studies":[
    {
        "study_id":"Study3",
        "samples":[
            {
                "sample_id":"SAMPLE3",
                "formatdata":[
                    {"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            },
            {
                "sample_id":"SAMPLE7",
                "formatdata":[
                    {"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            }
        ]
    }
]
}

我想知道有多少个子文档包含GT:"1 | 0",在这种情况下,第一个文档中为1,第二个文档中为两个,第三个文档中为0.我已经尝试了unwind和aggregate函数,但是显然我没有做正确的事情.当我尝试通过"GT"字段计算子文档时,mongo抱怨:

I want to know how many subdocuments contain GT:"1|0", which in this case would be 1 in the first document, and two in the second, and 0 in the 3rd. I've tried the unwind and aggregate functions but I'm obviously not doing something correct. When I try to count the sub documents by the "GT" field, mongo complains:

db.collection.aggregate([{$group: {"$studies.samples.formatdata.GT":1,_id:0}}])

因为我的网上论坛的名称不能包含.",但是如果我将其省略:

since my group's names cannot contain ".", yet if I leave them out:

db.collection.aggregate([{$group: {"$GT":1,_id:0}}])

它抱怨,因为"$ GT不能是运算符名称"

it complains because "$GT cannot be an operator name"

有什么想法吗?

推荐答案

您需要处理 $unwind 处理数组时,您需要执行三遍:

You need to process $unwind when working with arrays, and you need to do this three times:

 db.collection.aggregate([

     // Un-wind the array's to access filtering 
     { "$unwind": "$studies" },
     { "$unwind": "$studies.samples" },
     { "$unwind": "$studies.samples.formdata" },

     // Group results to obtain the matched count per key
     { "$group": {
         "_id": "$studies.samples.formdata.GT",
         "count": { "$sum": 1 }
     }}
 ])

理想情况下,您想过滤输入.可能在$ unwind之前和之后都使用 $ match 进行此操作处理并使用 $ regex 来匹配文档中的数据点以"1"开头.

Ideally you want to filter your input. Possibly do this with a $match both before and after $unwind is processed and using a $regex to match documents where the data at point begins with a "1".

 db.collection.aggregate([

     // Match first to exclude documents where this is not present in any array member
     { "$match": { "studies.samples.formdata.GT": /^1/ } },

     // Un-wind the array's to access filtering 
     { "$unwind": "$studies" },
     { "$unwind": "$studies.samples" },
     { "$unwind": "$studies.samples.formdata" },

     // Match to filter
     { "$match": { "studies.samples.formdata.GT": /^1/ } },

     // Group results to obtain the matched count per key
     { "$group": {
         "_id": {
              "_id": "$_id",
              "key": "$studies.samples.formdata.GT"
         },
         "count": { "$sum": 1 }
     }}
 ])

请注意,在所有情况下,带美元$"前缀的条目都是指引用文档属性的变量".这些是在右侧使用输入的值".左侧的键"必须指定为纯字符串键.不能使用任何变量来命名键.

Note that in all cases the "dollar $" prefixed entries are the "variables" referring to properties of the document. These are "values" to use an input on the right side. The left side "keys" must be specified as a plain string key. No variable can be used to name a key.

这篇关于MongoDB嵌套对象聚合计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆