MongoDB将相关的收集项目计数与其他收集结果合并 [英] MongoDB merge related collection item count with other collection results

查看:43
本文介绍了MongoDB将相关的收集项目计数与其他收集结果合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是mongodb的新手,正在尝试弄清楚如何有效地查询集合中的每个项目.

I'm new to mongodb and trying figure out how to efficiently query on each item within a collection.

我有projects个收藏夹& tasks收藏

I have projects collection & tasks collections

//projects
{
   _id: ObjectId(),
   name: String
}
//tasks
{
   _id: ObjectId(),
   projectId: ObjectId(), //reference project id
   completed: Bool
}

我想获取所有项目,然后计算每个项目的completedincomplete任务

I would like to get all projects and then count of completed and incomplete tasks of each project

db.projects.find({})...
//perhaps something similar in output
[
 {
   _id: ObjectId(), //projectId
   name: String
   completed: Number,
   incomplete: Number
 }
]

我使用猫鼬作为ORM.我不知道这在猫鼬甚至本机mongodb查询中是否可行.感谢任何帮助.谢谢!

I'm using mongoose as ORM. I don't know if this is possible in mongoose or even native mongodb query. Appreciate any help. Thanks!

推荐答案

无论您怎么看,只要您具有这样的规范化关系,就需要两个查询来获取包含任务"详细信息的结果收集并填写项目"收集中的详细信息. MongoDB不以任何方式使用联接,而mongoose也不例外.猫鼬确实提供了.populate(),但这只是方便魔术,实际上是在运行另一个查询并将结果合并到引用的字段值上.

Whatever way you look at this, as long as you have a normalized relationship like this then you would need two queries to get a result containing details from the "tasks" collection and filling out with details from the "projects" collection. MongoDB does not use joins in any way, and mongoose is no different. Mongoose does offer .populate(), but that is only convenience magic for what is essentially running another query and merging the results on the referenced field value.

因此,在这种情况下,您最终可能会考虑将项目信息嵌入到任务中.当然会有重复,但是使用单个集合可以使查询模式更加简单.

So this is one case where maybe you may ultimately consider embedding the project information in the task. Of course there will be duplication, but it makes the query patterns much more simple with a singular collection.

基本上,将引用的模型与分离的集合保持在一起,便有两种方法.但是首先,您可以使用聚合以获得更多结果符合您的实际要求:

Keeping the collections separated with a referenced model you basically then have two approaches. But firstly you can use aggregate in order to get results more along your actual requirements:

      Task.aggregate(
        [
          { "$group": {
            "_id": "$projectId",
            "completed": {
              "$sum": {
                "$cond": [ "$completed", 1, 0 ]
              }
            },
            "incomplete": {
              "$sum": {
                "$cond": [ "$completed", 0, 1 ]
              }
            }
          }}
        ],
        function(err,results) {

        }
    );

这仅使用 $group 管道,以便在任务"集合中累积"projectid"的值.为了计算已完成"和未完成"的值,我们使用 $cond 运算符,该运算符是确定要传递给true到"then"或"else"的位置.

This merely uses a $group pipeline in order to accumulate on the values of "projectid" witin the "tasks" collection. In order to count the values for "completed" and "incomplete" we use the $cond operator which is a ternary to decide which value to pass to $sum. Since the first or "if" condition here is a boolean evaluation, then the existing boolean "complete" field will do, passing on where true to "then" or "else" passing the third argument.

这些结果还可以,但是它们不包含来自项目"集合的有关"_id"值的任何信息.使输出看起来像这样的一种方法是在返回的结果"对象上的聚合结果回调中调用.populate()的模型形式:

Those results are okay but they do not contain any information from the "project" collection for the collected "_id" values. One approach to making the output look this way is to call the model form of .populate() from within the aggregation results callback on the returned "results" object:

    Project.populate(results,{ "path": "_id" },callback);

在这种形式中,.populate()调用将对象或数据数组作为第一个参数,第二个是填充的选项文档,此处的必填字段用于"path".这将处理所有项目,并从被称为将这些对象插入回调中的结果数据的模型中填充".

In this form the .populate() call takes an object or array of data as it's first argument, with the second being an options document for the population, where the mandatory field here is for "path". This will process any items and "populate" from the model that was called inserting those objects into the results data in the callback.

作为完整的示例清单:

var async = require('async'),
    mongoose = require('mongoose'),
    Schema = mongoose.Schema;

var projectSchema = new Schema({
  "name": String
});

var taskSchema = new Schema({
  "projectId": { "type": Schema.Types.ObjectId, "ref": "Project" },
  "completed": { "type": Boolean, "default": false }
});

var Project = mongoose.model( "Project", projectSchema );
var Task = mongoose.model( "Task", taskSchema );

mongoose.connect('mongodb://localhost/test');

async.waterfall(
  [
    function(callback) {
      async.each([Project,Task],function(model,callback) {
        model.remove({},callback);
      },
      function(err) {
        callback(err);
      });
    },

    function(callback) {
      Project.create({ "name": "Project1" },callback);
    },

    function(project,callback) {
      Project.create({ "name": "Project2" },callback);
    },

    function(project,callback) {
      Task.create({ "projectId": project },callback);
    },

    function(task,callback) {
      Task.aggregate(
        [
          { "$group": {
            "_id": "$projectId",
            "completed": {
              "$sum": {
                "$cond": [ "$completed", 1, 0 ]
              }
            },
            "incomplete": {
              "$sum": {
                "$cond": [ "$completed", 0, 1 ]
              }
            }
          }}
        ],
        function(err,results) {
          if (err) callback(err);
          Project.populate(results,{ "path": "_id" },callback);
        }
      );
    }
  ],
  function(err,results) {
    if (err) throw err;
    console.log( JSON.stringify( results, undefined, 4 ));
    process.exit();
  }
);

这将产生如下结果:

[
    {
        "_id": {
            "_id": "54beef3178ef08ca249b98ef",
            "name": "Project2",
            "__v": 0
        },
        "completed": 0,
        "incomplete": 1
    }
]

因此,.populate()对于此类聚合结果非常有效,即使与其他查询一样有效,并且通常应适合大多数用途.但是,清单中有一个特定的示例,其中创建了两个"项目,但当然只有一个"任务仅引用了一个项目.

So .populate() works well for this sort of aggregation result, even as effectively another query, and should generally be suitable for most purposes. There was however a specific example included in the listing where there are "two" projects created but of course only "one" task referencing just one of the projects.

由于聚合正在处理任务"集合,因此它不了解那里未引用的任何项目".为了获得具有计算出的总数的项目"的完整列表,您需要在运行两个查询和合并"结果时更加具体.

Since aggregation is working on the "tasks" collection, it has no knowledge whatsoever of any "project" that is not referenced there. In order to get a complete list of "projects" with the calculated totals you need to be more specific in running two queries and "merging" the results.

这基本上是对不同键和数据的哈希合并",但是,它的好帮手是名为 nedb的模块,它允许您以与MongoDB查询和操作更一致的方式应用逻辑.

This is basically a "hash merge" on distinct keys and data, however nice helper for this is a module called nedb, which allows you to apply the logic in a manner more consistent with MongoDB queries and operations.

基本上,您想从具有扩展字段的项目"集合中复制数据,然后您想要将该信息合并"或.update()与聚合结果一起使用.再次作为完整清单来演示:

Basically you want a copy of the data from the "projects" collection with augmented fields, then you want to "merge" or .update() that information with the aggregation results. Again as a complete listing to demonstrate:

var async = require('async'),
    mongoose = require('mongoose'),
    Schema = mongoose.Schema,
    DataStore = require('nedb'),
    db = new DataStore();


var projectSchema = new Schema({
  "name": String
});

var taskSchema = new Schema({
  "projectId": { "type": Schema.Types.ObjectId, "ref": "Project" },
  "completed": { "type": Boolean, "default": false }
});

var Project = mongoose.model( "Project", projectSchema );
var Task = mongoose.model( "Task", taskSchema );

mongoose.connect('mongodb://localhost/test');

async.waterfall(
  [
    function(callback) {
      async.each([Project,Task],function(model,callback) {
        model.remove({},callback);
      },
      function(err) {
        callback(err);
      });
    },

    function(callback) {
      Project.create({ "name": "Project1" },callback);
    },

    function(project,callback) {
      Project.create({ "name": "Project2" },callback);
    },

    function(project,callback) {
      Task.create({ "projectId": project },callback);
    },

    function(task,callback) {
      async.series(
        [

          function(callback) {
            Project.find({},function(err,projects) {
              async.eachLimit(projects,10,function(project,callback) {
                db.insert({
                  "projectId": project._id.toString(),
                  "name": project.name,
                  "completed": 0,
                  "incomplete": 0
                },callback);
              },callback);
            });
          },

          function(callback) {
            Task.aggregate(
              [
                { "$group": {
                  "_id": "$projectId",
                  "completed": {
                    "$sum": {
                      "$cond": [ "$completed", 1, 0 ]
                    }
                  },
                  "incomplete": {
                    "$sum": {
                      "$cond": [ "$completed", 0, 1 ]
                    }
                  }
                }}
              ],
              function(err,results) {
                async.eachLimit(results,10,function(result,callback) {
                  db.update(
                    { "projectId": result._id.toString() },
                    { "$set": {
                        "complete": result.complete,
                        "incomplete": result.incomplete
                      }
                    },
                    callback
                  );
                },callback);
              }
            );
          },

        ],

        function(err) {
          if (err) callback(err);
          db.find({},{ "_id": 0 },callback);
        }
      );
    }
  ],
  function(err,results) {
    if (err) throw err;
    console.log( JSON.stringify( results, undefined, 4 ));
    process.exit();
  }

结果在这里:

[
    {
        "projectId": "54beef4c23d4e4e0246379db",
        "name": "Project2",
        "completed": 0,
        "incomplete": 1
    },
    {
        "projectId": "54beef4c23d4e4e0246379da",
        "name": "Project1",
        "completed": 0,
        "incomplete": 0
    }
]

其中列出了每个项目"中的数据,并包括与之相关的任务"集合中的计算值.

That lists data from every "project" and includes the calculated values from the "tasks" collection that is related to it.

因此,您可以采取几种方法.同样,最终最好还是将任务"嵌入到项目"项中,这还是一种简单的聚合方法.而且,如果您要嵌入任务信息,则还可以在项目"对象上维护完成"和未完成"的计数器,并只需将它们更新,因为在任务数组中使用标记项目已完成. ="http://docs.mongodb.org/manual/reference/operator/update/inc/" rel ="nofollow"> $inc 运算符.

So there are a few approaches you can do. Again, you might ultimately be best off just embedding "tasks" into the "project" items instead, which would again be a simple aggregation approach. And if you are going to embed the task information, then you may as well maintain counters for "complete" and "incomplete" on the "project" object and simply update these as items are marked completed in the tasks array with the $inc operator.

var taskSchema = new Schema({
  "completed": { "type": Boolean, "default": false }
});

var projectSchema = new Schema({
  "name": String,
  "completed": { "type": Number, "default": 0 },
  "incomplete": { "type": Number, "default": 0 }
  "tasks": [taskSchema]
});

var Project = mongoose.model( "Project", projectSchema );
// cheat for a model object with no collection
var Task = mongoose.model( "Task", taskSchema, undefined );

// Then in later code

// Adding a task
var task = new Task();
Project.update(
    { "task._id": { "$ne": task._id } },
    { 
        "$push": { "tasks": task },
        "$inc": {
            "completed": ( task.completed ) ? 1 : 0,
            "incomplete": ( !task.completed ) ? 1 : 0;
        }
    },
    callback
 );

// Removing a task
Project.update(
    { "task._id": task._id },
    { 
        "$pull": { "tasks": { "_id": task._id } },
        "$inc": {
            "completed": ( task.completed ) ? -1 : 0,
            "incomplete": ( !task.completed ) ? -1 : 0;
        }
    },
    callback
 );


 // Marking complete
Project.update(
    { "tasks": { "$elemMatch": { "_id": task._id, "completed": false } }},
    { 
        "$set": { "tasks.$.completed": true },
        "$inc": {
            "completed": 1,
            "incomplete": -1
        }
    },
    callback
);

尽管您必须知道当前的任务状态,才能使计数器更新正常工作,但这很容易编写代码,并且您至少应该在传递给方法的对象中包含这些详细信息.

You have to know the current task status though for the counter updates to work correctly, but this is easy to code for and you probably should have at least those details in an object passing into your methods.

我个人将重新建模为后一种形式,然后执行此操作.您可以执行查询合并",如此处两个示例所示,但这当然是有代价的.

Personally I would re-model to the latter form and do that. You can do query "merging" as has been shown in two examples here, but it does of course come at a cost.

这篇关于MongoDB将相关的收集项目计数与其他收集结果合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆