mapReduce/Aggregation:按嵌套文档中的值分组 [英] mapReduce/Aggregation: Group by a value in a nested document

查看:81
本文介绍了mapReduce/Aggregation:按嵌套文档中的值分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想象我有一个这样的收藏集:

imagine I have a collection like this:

{
  "_id": "10280",
  "city": "NEW YORK",
  "state": "NY",
  "departments": [
             {"departmentType":"01",
              "departmentHead":"Peter"},
             {"departmentType":"02",
              "departmentHead":"John"}
  ]
},
{
  "_id": "10281",
  "city": "LOS ANGELES",
  "state": "CA",
  "departments": [
             {"departmentType":"02",
              "departmentHead":"Joan"},
             {"departmentType":"03",
              "departmentHead":"Mary"}
  ]
},
{
  "_id": "10284",
  "city": "MIAMI",
  "state": "FL",
  "department": [
  "departments": [
             {"departmentType":"01",
              "departmentHead":"George"},
             {"departmentType":"02",
              "departmentHead":"Harry"}
  ]
}

我想按部门类型进行计数,例如:

I'd like to get a count per departmentType, something like:

[{"departmentType":"01", "dCount":2},
 {"departmentType":"02", "dCount":3},
 {"departmentType":"03", "dCount":1}
]

为此,我已经尝试了几乎所有内容,但是我在网上找到的所有示例都是更简单的示例,其中group by是在文档根目录级的字段上完成的.相反,在这里我尝试按DepartmentType分组,这似乎破坏了我到目前为止找到的所有内容.

For this, I've tried almost everything already, but all examples I find online are easier ones where the group by is done over a field at the root level of the document. Instead, here I'm trying to group by departmentType, and that seems to break everything I found so far.

关于如何使用Mongoose的聚合实现或mapreduce做到这一点的任何想法?

Any ideas on how to do this using Mongoose's aggregation implementation or mapreduce?

理想情况下,我想排除所有count< == 1的departmentType,并按部门Type对结果进行排序.

Ideally, I'd like to exclude all departmentTypes with count <= 1 and sort the results by departmentType.

谢谢大家!

推荐答案

您需要$ unwind个Departments数组,该数组将为数组中的每个条目创建一个文档,以便您可以在管道中对其进行汇总.

You need to $unwind the departments array which will create a document for each entry in the array so you can aggregate them in the pipeline.

不幸的是,您不能预过滤DepartmentTypes< = 1,因为$ size仅会使用一个确切的值,但是您可以将其从结果中过滤掉.这不是很好,但是可以.此示例仅预过滤具有准确2个部门的那些记录,但这仅用于演示,您可能希望删除第一个$ match,因为稍后我们会在结果中过滤掉< = 1和第二个$ match;

Unfortunately, you can't pre-filter departmentTypes <= 1 because $size will only take a an exact value, but you can filter it out of the results. It's not great, but it works. This example pre-filters only those records with EXACTLY 2 departments, but it's for demo only, you probably want to drop the first $match because we filter out <=1 with the second $match on the results later on;

db.runCommand({
    aggregate: "so",
    pipeline: [
        {   // filter out only records with 2 departments
            $match: {
                departments: { $size: 2 }
            }
        },
        // unwind - create a doc for each department in the array
        { $unwind: "$departments" },
        {   // aggregate sum of departments by type
            $group: {
                _id: "$departments.departmentType",
                count: { $sum: 1 },
            }
        },
        {   // filter out departments with <=1
            $match: {
                count: { $gt: 1 },
            }
        },
        {   // rename fields as per example
            $project: {
                _id: 0,
                departmentType: "$_id",
                dCount: "$count",
            }
        }
    ]
});

请注意,我还假设您以前的json示例有错字,并且部门"实际上并不存在.假定所有文档都具有与前两个文档相同的架构,那么此代码将起作用.

Note that I've also assumed that your previous json sample has a typo, and "department" doesn't actually exist. This code will work assuming all the documents have the same schema as the first two.

如果您不必担心获得的实际字段名称,可以随意删除第一个$ match和最后一个$ project.

Feel free to drop the first $match, and the last $project if you're not bothered about the actual field names you get.

这篇关于mapReduce/Aggregation:按嵌套文档中的值分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆