在Mongo中对文档中的子文档字段求平均值 [英] Average a Sub Document Field Across Documents in Mongo

查看:190
本文介绍了在Mongo中对文档中的子文档字段求平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于给定的记录ID,如果我在MongoDB中具有以下内容,如何获得子文档字段的平均值:

For a given record id, how do I get the average of a sub document field if I have the following in MongoDB:

/* 0 */
{
    "item" : "1",
    "samples" : [ 
        {
            "key" : "test-key",
            "value" : "1"
        }, 
        {
            "key" : "test-key2",
            "value" : "2"
        }
    ]
}

/* 1 */
{
    "item" : "1",
    "samples" : [ 
        {
            "key" : "test-key",
            "value" : "3"
        }, 
        {
            "key" : "test-key2",
            "value" : "4"
        }
    ]
}

我想获取给定项目ID(在本例中为1)的其中key ="test-key"的值的平均值.因此平均值应为$ avg(1 + 3)= 2

I want to get the average of the values where key = "test-key" for a given item id (in this case 1). So the average should be $avg (1 + 3) = 2

谢谢

推荐答案

您将需要使用

You'll need to use the aggregation framework. The aggregation will end up looking something like this:

db.stack.aggregate([
  { $match: { "samples.key" : "test-key" } },
  { $unwind : "$samples" },
  { $match : { "samples.key" : "test-key" } },
  { $project : { "new_key" : "$samples.key", "new_value" : "$samples.value" } },
  { $group : { `_id` : "$new_key", answer : { $avg : "$new_value" } } }
])

思考聚合框架的最佳方法就像一条组装线.查询本身是一个JSON文档数组,其中每个子文档代表程序集中的一个不同步骤.

The best way to think of the aggregation framework is like an assembly line. The query itself is an array of JSON documents, where each sub-document represents a different step in the assembly.

第一步是基本的过滤器,例如SQL中的WHERE子句.我们首先放置此步骤,以筛选出不包含包含test-key的数组元素的所有文档.将其放置在管道的开头,可以使聚合使用索引.

The first step is a basic filter, like a WHERE clause in SQL. We place this step first to filter out all documents that do not contain an array element containing test-key. Placing this at the beginning of the pipeline allows the aggregation to use indexes.

第二步$unwind用于分隔样本"数组中的每个元素,因此我们可以对所有元素执行操作.如果仅通过该步骤运行查询,您将明白我的意思. 长话短说:

The second step, $unwind, is used for separating each of the elements in the "samples" array so we can perform operations across all of them. If you run the query with just that step, you'll see what I mean. Long story short :

{ name : "bob", 
  children : [ {"name" : mary}, { "name" : "sue" } ] 
} 

成为两个文档:

{ name : "bob", children : [ { "name" : mary } ] }
{ name : "bob", children : [ { "name" : sue } ] }

第3步:$ match

第三步$match与第一阶段$match完全相同,但用途不同.由于它遵循$unwind,因此此阶段将筛选出与筛选条件不匹配的先前数组元素(现在为文档).在这种情况下,我们仅保留samples.key = "test-key"

Step 3: $match

The third step, $match, is an exact duplicate of the first $match stage, but has a different purpose. Since it follows $unwind, this stage filters out previous array elements, now documents, that don't match the filter criteria. In this case, we keep only documents where samples.key = "test-key"

第四步,$project,重组文档.在这种情况下,我将项目从数组中拉出,因此可以直接引用它们.使用上面的示例.

The fourth step, $project, restructures the document. In this case, I pulled the items out of the array so I could reference them directly. Using the example above..

{ name : "bob", children : [ { "name" : mary } ] }

成为

{ new_name : "bob", new_child_name : mary }

请注意,此步骤完全是可选步骤;稍作更改后,即使没有$project,也可以完成更高的阶段.在大多数情况下,$project完全是化妆品.聚合具有很多优化功能,例如可以手动包含或排除$project 应该没有必要.

Note that this step is entirely optional; later stages could be completed even without this $project after a few minor changes. In most cases $project is entirely cosmetic; aggregations have numerous optimizations under the hood such that manually including or excluding fields in a $project should not be necessary.

最后,$group是发生魔术的地方. _id值将在SQL世界中分组依据".第二个字段是对我在$project步骤中定义的值求平均值.您可以轻松地用$sum代替执行总和,但是计数操作通常通过以下方式完成:my_count : { $sum : 1 }.

Finally, $group is where the magic happens. The _id value what you will be "grouping by" in the SQL world. The second field is saying to average over the value that I defined in the $project step. You can easily substitute $sum to perform a sum, but a count operation is typically done the following way: my_count : { $sum : 1 }.

这里要注意的最重要的事情是,大部分工作是将数据格式化为执行该操作很简单的一点.

The most important thing to note here is that the majority of the work being done is to format the data to a point where performing the operation is simple.

最后,我想指出,由于samples.value被定义为文本,因此不能 在提供的示例数据上使用,不能在算术运算中使用.如果您有兴趣,请在此处介绍更改字段类型的方法:

Lastly, I wanted to note that this would not work on the example data provided since samples.value is defined as text, which can't be used in arithmetic operations. If you're interested, changing the type of a field is described here: MongoDB How to change the type of a field

这篇关于在Mongo中对文档中的子文档字段求平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆