在一个查询中为每个属性分组不同的值和计数 [英] Group Distinct Values and Counts for Each Property in One Query

查看:119
本文介绍了在一个查询中为每个属性分组不同的值和计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在个人资料集中有数据

I have a data in profile collection

[
    {
        name: "Harish",
        gender: "Male",
        caste: "Vokkaliga",
        education: "B.E"
    },
    {
        name: "Reshma",
        gender: "Female",
        caste: "Vokkaliga",
        education: "B.E"
    },
    {
        name: "Rangnath",
        gender: "Male",
        caste: "Lingayath",
        education: "M.C.A"
    },
    {
        name: "Lakshman",
        gender: "Male",
        caste: "Lingayath",
        education: "B.Com"
    },
    {
        name: "Reshma",
        gender: "Female",
        caste: "Lingayath",
        education: "B.E"
    }
]

这里我需要计算不同性别的总数,不同种姓的总数和不同教育的总数。
预计o / p

here I need to calculate total Number of different gender, total number of different caste and total number of different education. Expected o/p

{
    gender: [{
        name: "Male",
        total: "3"
    },
    {
        name: "Female",
        total: "2"
    }],
    caste: [{
        name: "Vokkaliga",
        total: "2"
    },
    {
        name: "Lingayath",
        total: "3"
    }],
    education: [{
        name: "B.E",
        total: "3"
    },
    {
        name: "M.C.A",
        total: "1"
    },
    {
        name: "B.Com",
        total: "1"
    }]
}

使用mongodb聚合如何获得预期结果。

using mongodb aggregation how can I get the expected result.

推荐答案

根据可用的版本,有不同的方法,但它们基本上都分解为将文档字段转换为单独的文档在数组中,然后使用 $ unwind 并连续执行 $ group 阶段以便积累输出总数和数组。

There are different approaches depending on the version available, but they all essentially break down to transforming your document fields into separate documents in an "array", then "unwinding" that array with $unwind and doing successive $group stages in order to accumulate the output totals and arrays.

最新版本有特殊运算符,如< a href =https://docs.mongodb.com/manual/reference/operator/aggregation/arrayToObject/\"rel =nofollow noreferrer> $ arrayToObject $ objectToArray 可以从源文档转移到初始数组比早期版本更动态:

Latest releases have special operators like $arrayToObject and $objectToArray which can make transfer to the initial "array" from the source document more dynamic than in earlier releases:

db.profile.aggregate([
  { "$project": { 
     "_id": 0,
     "data": { 
       "$filter": {
         "input": { "$objectToArray": "$$ROOT" },
         "cond": { "$in": [ "$$this.k", ["gender","caste","education"] ] }
       }   
     }
  }},
  { "$unwind": "$data" },
  { "$group": {
    "_id": "$data",
    "total": { "$sum": 1 }  
  }},
  { "$group": {
    "_id": "$_id.k",
    "v": {
      "$push": { "name": "$_id.v", "total": "$total" } 
    }  
  }},
  { "$group": {
    "_id": null,
    "data": { "$push": { "k": "$_id", "v": "$v" } }
  }},
  { "$replaceRoot": {
    "newRoot": {
      "$arrayToObject": "$data"
    }
  }}
])

所以使用 $ objectToArray 您将初始文档放入其数组的键和值为kv键中的产生的对象数组。我们申请 $ filter 这里是为了按键选择。这里使用 $ in 包含我们想要的密钥列表,但这可以更加动态地用作排除密钥的列表,其中密钥更短。它只是使用逻辑运算符来评估条件。

So using $objectToArray you make the initial document into an array of it's keys and values as "k" and "v" keys in the resulting array of objects. We apply $filter here in order to select by "key". Here using $in with a list of keys we want, but this could be more dynamically used as a list of keys to "exclude" where that was shorter. It's just using logical operators to evaluate the condition.

此处的结束阶段使用 $ replaceRoot ,因为我们之间的所有操作和分组仍保持kv表单,然后我们使用 $ arrayToObject 这里将我们的对象数组推广到 输出中顶级文档的键。

The end stage here uses $replaceRoot and since all our manipulation and "grouping" in between still keeps that "k" and "v" form, we then use $arrayToObject here to promote our "array of objects" in result to the "keys" of the top level document in output.

作为额外的皱纹, MongoDB 3.6包括 $ mergeObjects 可用作 $ group 管道阶段,因此替换 $ push 并最终 $ replaceRoot 只需转移data返回文档root的关键字:

As an extra wrinkle here, MongoDB 3.6 includes $mergeObjects which can be used as an "accumulator" in a $group pipeline stage as well, thus replacing the $push and making the final $replaceRoot simply shifting the "data" key to the "root" of the returned document instead:

db.profile.aggregate([
  { "$project": { 
     "_id": 0,
     "data": { 
       "$filter": {
         "input": { "$objectToArray": "$$ROOT" },
         "cond": { "$in": [ "$$this.k", ["gender","caste","education"] ] }
       }   
     }
  }},
  { "$unwind": "$data" },
  { "$group": { "_id": "$data", "total": { "$sum": 1 } }},
  { "$group": {
    "_id": "$_id.k",
    "v": {
      "$push": { "name": "$_id.v", "total": "$total" } 
    }  
  }},
  { "$group": {
    "_id": null,
    "data": {
      "$mergeObjects": {
        "$arrayToObject": [
          [{ "k": "$_id", "v": "$v" }]
        ] 
      }
    }  
  }},
  { "$replaceRoot": { "newRoot": "$data"  } }
])

这与整体演示的内容并没有什么不同,只是简单地演示了 $ mergeObjects 可以这种方式使用,在分组的情况下可能很有用key是不同的东西,我们不希望最终合并到对象的根空间。

This is not really that different to what is being demonstrated overall, but simply demonstrates how $mergeObjects can be used in this way and may be useful in cases where the grouping key was something different and we did not want that final "merge" to the root space of the object.

请注意 $ arrayToObject 回到密钥的名称,但我们只是在累积期间而不是在分组之后进行,因为新的累积允许键的合并。

Note that the $arrayToObject is still needed to transform the "value" back into the name of the "key", but we just do it during the accumulation rather than after the grouping, since the new accumulation allows the "merge" of keys.

收回一个版本,或者即使你有一个小于3.4.4版本的MongoDB 3.4.x,我们仍然可以使用这个版本但是我们以更静态的方式处理数组的创建,以及由于我们没有的聚合运算符而在输出上处理最终的转换:

Taking it back a version or even if you have a MongoDB 3.4.x that is less than the 3.4.4 release, we can still use much of this but instead we deal with the creation of the array in a more static fashion, as well as handling the final "transform" on output differently due to the aggregation operators we don't have:

db.profile.aggregate([
  { "$project": {
    "data": [
      { "k": "gender", "v": "$gender" },
      { "k": "caste", "v": "$caste" },
      { "k": "education", "v": "$education" }
    ]
  }},
  { "$unwind": "$data" },
  { "$group": {
    "_id": "$data",
    "total": { "$sum": 1 }  
  }},
  { "$group": {
    "_id": "$_id.k",
    "v": {
      "$push": { "name": "$_id.v", "total": "$total" } 
    }  
  }},
  { "$group": {
    "_id": null,
    "data": { "$push": { "k": "$_id", "v": "$v" } }
  }},
  /*
  { "$replaceRoot": {
    "newRoot": {
      "$arrayToObject": "$data"
    }
  }}
  */
]).map( d => 
  d.data.map( e => ({ [e.k]: e.v }) )
    .reduce((acc,curr) => Object.assign(acc,curr),{})
)

这是完全相同的事情,除了没有动态变换将文档放入数组中,我们实际上明确地为每个数组成员分配相同的kv符号。真的只是保留这些关键名称,因为这里没有任何聚合运算符完全取决于它。

This is exactly the same thing, except instead of having a dynamic transform of the document into the array, we actually "explicitly" assign each array member with the same "k" and "v" notation. Really just keeping those key names for convention at this point since none of the aggregation operators here depend on that at all.

而不是使用 $ replaceRoot ,我们正是这样做的与之前的管道阶段实现相同,但在客户端代码中。所有MongoDB驱动程序都有一些 cursor.map( ) 启用光标变换。在shell中,我们使用 Array.map() Array.reduce() 获取该输出并再次将数组内容提升为键返回的顶级文档。

Also instead of using $replaceRoot, we just do exactly the same thing as what the previous pipeline stage implementation was doing there but in client code instead. All MongoDB drivers have some implementation of cursor.map() to enable "cursor transforms". Here with the shell we use the basic JavaScript functions of Array.map() and Array.reduce() to take that output and again promote the array content to being the keys of the top level document returned.

并回到MongoDB 2.6以涵盖版本之间,唯一改变的是使用 $ map $ literal 使用数组声明输入:

And falling back to MongoDB 2.6 to cover the versions in between, the only thing that changes here is the usage of $map and a $literal for input with the array declaration:

db.profile.aggregate([
  { "$project": {
    "data": {
      "$map": {
        "input": { "$literal": ["gender","caste", "education"] },
        "as": "k",
        "in": {
          "k": "$$k",
          "v": {
            "$cond": {
              "if": { "$eq": [ "$$k", "gender" ] },
              "then": "$gender",
              "else": {
                "$cond": {
                  "if": { "$eq": [ "$$k", "caste" ] },
                  "then": "$caste",
                  "else": "$education"
                }
              }    
            }
          }    
        }
      }
    }
  }},
  { "$unwind": "$data" },
  { "$group": {
    "_id": "$data",
    "total": { "$sum": 1 }  
  }},
  { "$group": {
    "_id": "$_id.k",
    "v": {
      "$push": { "name": "$_id.v", "total": "$total" } 
    }  
  }},
  { "$group": {
    "_id": null,
    "data": { "$push": { "k": "$_id", "v": "$v" } }
  }},
  /*
  { "$replaceRoot": {
    "newRoot": {
      "$arrayToObject": "$data"
    }
  }}
  */
])
.map( d => 
  d.data.map( e => ({ [e.k]: e.v }) )
    .reduce((acc,curr) => Object.assign(acc,curr),{})
)

因为这里的基本思想是迭代一个提供的数组字段名称,值的实际分配来自嵌套 $ cond 语句。对于三种可能的结果,这意味着只有一个嵌套,以便为每个结果分支。

Since the basic idea here is to "iterate" a provided array of the field names, the actual assignment of values comes by "nesting" the $cond statements. For three possible outcomes this means only a single nesting in order to "branch" for each outcome.

来自3.4的现代MongoDB具有 $ switch 这使得这个分支更简单,但是这个证明逻辑永远是可能的并且 $ cond 运算符自MongoDB 2.2中引入聚合框架以来一直存在。

Modern MongoDB from 3.4 have $switch which makes this branching simpler, yet this demonstrates the logic was always possible and the $cond operator has been around since the aggregation framework was introduced in MongoDB 2.2.

同样,对游标结果的相同转换适用于没有什么新东西,大多数编程语言都有能力这么做多年,如果不是从一开始。

Again, the same transformation on the cursor result applies as there is nothing new there and most programming languages have the ability to do this for years, if not from inception.

当然基本过程甚至可以回到MongoDB 2.2,但只是应用数组创建和 $ unwind 以不同的方式。但是目前没有人应该在2.8以下运行任何MongoDB,甚至从3.0开始的官方支持甚至快速耗尽。

Of course the basic process can even be done way back to MongoDB 2.2, but just applying the array creation and $unwind in a different way. But no-one should be running any MongoDB under 2.8 at this point in time, and official support even from 3.0 is even fast running out.

对于可视化,此处所有演示的管道的输出在最后一次转换完成之前具有以下形式:

For visualization, the output of all demonstrated pipelines here has the following form before the last "transform" is done:

/* 1 */
{
    "_id" : null,
    "data" : [ 
        {
            "k" : "gender",
            "v" : [ 
                {
                    "name" : "Male",
                    "total" : 3.0
                }, 
                {
                    "name" : "Female",
                    "total" : 2.0
                }
            ]
        }, 
        {
            "k" : "education",
            "v" : [ 
                {
                    "name" : "M.C.A",
                    "total" : 1.0
                }, 
                {
                    "name" : "B.E",
                    "total" : 3.0
                }, 
                {
                    "name" : "B.Com",
                    "total" : 1.0
                }
            ]
        }, 
        {
            "k" : "caste",
            "v" : [ 
                {
                    "name" : "Lingayath",
                    "total" : 3.0
                }, 
                {
                    "name" : "Vokkaliga",
                    "total" : 2.0
                }
            ]
        }
    ]
}

然后通过 $ replaceRoot 或结果变为光​​标变换:

And then either by the $replaceRoot or the cursor transform as demonstrated the result becomes:

/* 1 */
{
    "gender" : [ 
        {
            "name" : "Male",
            "total" : 3.0
        }, 
        {
            "name" : "Female",
            "total" : 2.0
        }
    ],
    "education" : [ 
        {
            "name" : "M.C.A",
            "total" : 1.0
        }, 
        {
            "name" : "B.E",
            "total" : 3.0
        }, 
        {
            "name" : "B.Com",
            "total" : 1.0
        }
    ],
    "caste" : [ 
        {
            "name" : "Lingayath",
            "total" : 3.0
        }, 
        {
            "name" : "Vokkaliga",
            "total" : 2.0
        }
    ]
}

因此,虽然我们可以将一些新的和奇特的运营商放入汇总管道在我们有可用的地方,最常见的用例是在这些管道变换的末端,在这种情况下我们也可以只需对返回的游标结果中的每个文档执行相同的转换。

So whilst we can put some new and fancy operators into the aggregation pipeline where we have those available, the most common use case is in these "end of pipeline transforms" in which case we may as well simply do the same transformation on each document in the cursor results returned instead.

这篇关于在一个查询中为每个属性分组不同的值和计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆