MongoDB 将相关收集项计数与其他收集结果合并 [英] MongoDB merge related collection item count with other collection results

查看:31
本文介绍了MongoDB 将相关收集项计数与其他收集结果合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 mongodb 的新手,正在尝试弄清楚如何有效地查询集合中的每个项目.

I'm new to mongodb and trying figure out how to efficiently query on each item within a collection.

我有 projects 收藏和任务集合

//projects
{
   _id: ObjectId(),
   name: String
}
//tasks
{
   _id: ObjectId(),
   projectId: ObjectId(), //reference project id
   completed: Bool
}

我想获取所有项目,然后计算每个项目的已完成incomplete任务

I would like to get all projects and then count of completed and incomplete tasks of each project

db.projects.find({})...
//perhaps something similar in output
[
 {
   _id: ObjectId(), //projectId
   name: String
   completed: Number,
   incomplete: Number
 }
]

我使用猫鼬作为 ORM.我不知道这在猫鼬甚至本机 mongodb 查询中是否可行.感谢任何帮助.谢谢!

I'm using mongoose as ORM. I don't know if this is possible in mongoose or even native mongodb query. Appreciate any help. Thanks!

推荐答案

无论你怎么看,只要你有这样的规范化关系,那么你需要两个查询来获得包含任务"详细信息的结果" 收集并填写项目"集合中的详细信息.MongoDB 不以任何方式使用连接,mongoose 也不例外.Mongoose 确实提供了 .populate(),但对于本质上运行另一个查询并将结果合并到引用的字段值上的功能来说,这只是方便的魔法.

Whatever way you look at this, as long as you have a normalized relationship like this then you would need two queries to get a result containing details from the "tasks" collection and filling out with details from the "projects" collection. MongoDB does not use joins in any way, and mongoose is no different. Mongoose does offer .populate(), but that is only convenience magic for what is essentially running another query and merging the results on the referenced field value.

因此,在这种情况下,您最终可能会考虑将项目信息嵌入任务中.当然会有重复,但使用单一集合会使查询模式更加简单.

So this is one case where maybe you may ultimately consider embedding the project information in the task. Of course there will be duplication, but it makes the query patterns much more simple with a singular collection.

将集合与引用模型分开,您基本上有两种方法.但首先你可以使用 aggregate 以获得更多结果根据您的实际要求:

Keeping the collections separated with a referenced model you basically then have two approaches. But firstly you can use aggregate in order to get results more along your actual requirements:

      Task.aggregate(
        [
          { "$group": {
            "_id": "$projectId",
            "completed": {
              "$sum": {
                "$cond": [ "$completed", 1, 0 ]
              }
            },
            "incomplete": {
              "$sum": {
                "$cond": [ "$completed", 0, 1 ]
              }
            }
          }}
        ],
        function(err,results) {

        }
    );

这仅使用 $group 管道,以便在tasks"集合中累积projectid"的值.为了计算完成"和不完整"的值,我们使用 $cond 运算符,它是一个三元组,用于决定将哪个值传递给 $sum.由于这里的第一个或if"条件是布尔评估,因此现有的布尔完整"字段将执行,将 where true 传递给then"或else",并传递第三个参数.

This merely uses a $group pipeline in order to accumulate on the values of "projectid" witin the "tasks" collection. In order to count the values for "completed" and "incomplete" we use the $cond operator which is a ternary to decide which value to pass to $sum. Since the first or "if" condition here is a boolean evaluation, then the existing boolean "complete" field will do, passing on where true to "then" or "else" passing the third argument.

这些结果没有问题,但它们不包含来自项目"集合中收集的_id"值的任何信息.使输出看起来像这种方式的一种方法是从返回的结果"对象的聚合结果回调中调用 .populate() 的模型形式:

Those results are okay but they do not contain any information from the "project" collection for the collected "_id" values. One approach to making the output look this way is to call the model form of .populate() from within the aggregation results callback on the returned "results" object:

    Project.populate(results,{ "path": "_id" },callback);

在这种形式中,.populate() 调用将一个对象或数据数组作为它的第一个参数,第二个参数是人口的选项文档,这里的必填字段是小路".这将处理任何项目并从被调用的模型中填充",将这些对象插入到回调中的结果数据中.

In this form the .populate() call takes an object or array of data as it's first argument, with the second being an options document for the population, where the mandatory field here is for "path". This will process any items and "populate" from the model that was called inserting those objects into the results data in the callback.

作为一个完整的示例清单:

As a complete example listing:

var async = require('async'),
    mongoose = require('mongoose'),
    Schema = mongoose.Schema;

var projectSchema = new Schema({
  "name": String
});

var taskSchema = new Schema({
  "projectId": { "type": Schema.Types.ObjectId, "ref": "Project" },
  "completed": { "type": Boolean, "default": false }
});

var Project = mongoose.model( "Project", projectSchema );
var Task = mongoose.model( "Task", taskSchema );

mongoose.connect('mongodb://localhost/test');

async.waterfall(
  [
    function(callback) {
      async.each([Project,Task],function(model,callback) {
        model.remove({},callback);
      },
      function(err) {
        callback(err);
      });
    },

    function(callback) {
      Project.create({ "name": "Project1" },callback);
    },

    function(project,callback) {
      Project.create({ "name": "Project2" },callback);
    },

    function(project,callback) {
      Task.create({ "projectId": project },callback);
    },

    function(task,callback) {
      Task.aggregate(
        [
          { "$group": {
            "_id": "$projectId",
            "completed": {
              "$sum": {
                "$cond": [ "$completed", 1, 0 ]
              }
            },
            "incomplete": {
              "$sum": {
                "$cond": [ "$completed", 0, 1 ]
              }
            }
          }}
        ],
        function(err,results) {
          if (err) callback(err);
          Project.populate(results,{ "path": "_id" },callback);
        }
      );
    }
  ],
  function(err,results) {
    if (err) throw err;
    console.log( JSON.stringify( results, undefined, 4 ));
    process.exit();
  }
);

这将产生如下结果:

[
    {
        "_id": {
            "_id": "54beef3178ef08ca249b98ef",
            "name": "Project2",
            "__v": 0
        },
        "completed": 0,
        "incomplete": 1
    }
]

所以 .populate() 非常适合这种聚合结果,即使是另一个查询也很有效,并且通常应该适用于大多数用途.然而,清单中包含一个特定示例,其中创建了两个"项目,但当然只有一个"任务仅引用其中一个项目.

So .populate() works well for this sort of aggregation result, even as effectively another query, and should generally be suitable for most purposes. There was however a specific example included in the listing where there are "two" projects created but of course only "one" task referencing just one of the projects.

由于聚合正在处理任务"集合,因此它对任何未在其中引用的项目"一无所知.为了获得包含计算总数的完整项目"列表,您需要更具体地运行两个查询并合并"结果.

Since aggregation is working on the "tasks" collection, it has no knowledge whatsoever of any "project" that is not referenced there. In order to get a complete list of "projects" with the calculated totals you need to be more specific in running two queries and "merging" the results.

这基本上是对不同键和数据的哈希合并",但是对此的好帮手是一个名为 nedb 的模块,它允许您以更符合 MongoDB 查询和操作的方式应用逻辑.

This is basically a "hash merge" on distinct keys and data, however nice helper for this is a module called nedb, which allows you to apply the logic in a manner more consistent with MongoDB queries and operations.

基本上,您需要项目"集合中的数据副本以及增强字段,然后您希望将该信息与聚合结果合并"或 .update().再次作为一个完整的清单来演示:

Basically you want a copy of the data from the "projects" collection with augmented fields, then you want to "merge" or .update() that information with the aggregation results. Again as a complete listing to demonstrate:

var async = require('async'),
    mongoose = require('mongoose'),
    Schema = mongoose.Schema,
    DataStore = require('nedb'),
    db = new DataStore();


var projectSchema = new Schema({
  "name": String
});

var taskSchema = new Schema({
  "projectId": { "type": Schema.Types.ObjectId, "ref": "Project" },
  "completed": { "type": Boolean, "default": false }
});

var Project = mongoose.model( "Project", projectSchema );
var Task = mongoose.model( "Task", taskSchema );

mongoose.connect('mongodb://localhost/test');

async.waterfall(
  [
    function(callback) {
      async.each([Project,Task],function(model,callback) {
        model.remove({},callback);
      },
      function(err) {
        callback(err);
      });
    },

    function(callback) {
      Project.create({ "name": "Project1" },callback);
    },

    function(project,callback) {
      Project.create({ "name": "Project2" },callback);
    },

    function(project,callback) {
      Task.create({ "projectId": project },callback);
    },

    function(task,callback) {
      async.series(
        [

          function(callback) {
            Project.find({},function(err,projects) {
              async.eachLimit(projects,10,function(project,callback) {
                db.insert({
                  "projectId": project._id.toString(),
                  "name": project.name,
                  "completed": 0,
                  "incomplete": 0
                },callback);
              },callback);
            });
          },

          function(callback) {
            Task.aggregate(
              [
                { "$group": {
                  "_id": "$projectId",
                  "completed": {
                    "$sum": {
                      "$cond": [ "$completed", 1, 0 ]
                    }
                  },
                  "incomplete": {
                    "$sum": {
                      "$cond": [ "$completed", 0, 1 ]
                    }
                  }
                }}
              ],
              function(err,results) {
                async.eachLimit(results,10,function(result,callback) {
                  db.update(
                    { "projectId": result._id.toString() },
                    { "$set": {
                        "complete": result.complete,
                        "incomplete": result.incomplete
                      }
                    },
                    callback
                  );
                },callback);
              }
            );
          },

        ],

        function(err) {
          if (err) callback(err);
          db.find({},{ "_id": 0 },callback);
        }
      );
    }
  ],
  function(err,results) {
    if (err) throw err;
    console.log( JSON.stringify( results, undefined, 4 ));
    process.exit();
  }

这里的结果:

[
    {
        "projectId": "54beef4c23d4e4e0246379db",
        "name": "Project2",
        "completed": 0,
        "incomplete": 1
    },
    {
        "projectId": "54beef4c23d4e4e0246379da",
        "name": "Project1",
        "completed": 0,
        "incomplete": 0
    }
]

这列出了来自每个项目"的数据,并包括来自与其相关的任务"集合的计算值.

That lists data from every "project" and includes the calculated values from the "tasks" collection that is related to it.

因此,您可以采取一些方法.同样,您最终可能最好将任务"嵌入到项目"项中,这又是一种简单的聚合方法.如果您要嵌入任务信息,那么您还可以在项目"对象上维护完成"和未完成"的计数器,并在任务数组中使用 $inc 运算符.

So there are a few approaches you can do. Again, you might ultimately be best off just embedding "tasks" into the "project" items instead, which would again be a simple aggregation approach. And if you are going to embed the task information, then you may as well maintain counters for "complete" and "incomplete" on the "project" object and simply update these as items are marked completed in the tasks array with the $inc operator.

var taskSchema = new Schema({
  "completed": { "type": Boolean, "default": false }
});

var projectSchema = new Schema({
  "name": String,
  "completed": { "type": Number, "default": 0 },
  "incomplete": { "type": Number, "default": 0 }
  "tasks": [taskSchema]
});

var Project = mongoose.model( "Project", projectSchema );
// cheat for a model object with no collection
var Task = mongoose.model( "Task", taskSchema, undefined );

// Then in later code

// Adding a task
var task = new Task();
Project.update(
    { "task._id": { "$ne": task._id } },
    { 
        "$push": { "tasks": task },
        "$inc": {
            "completed": ( task.completed ) ? 1 : 0,
            "incomplete": ( !task.completed ) ? 1 : 0;
        }
    },
    callback
 );

// Removing a task
Project.update(
    { "task._id": task._id },
    { 
        "$pull": { "tasks": { "_id": task._id } },
        "$inc": {
            "completed": ( task.completed ) ? -1 : 0,
            "incomplete": ( !task.completed ) ? -1 : 0;
        }
    },
    callback
 );


 // Marking complete
Project.update(
    { "tasks": { "$elemMatch": { "_id": task._id, "completed": false } }},
    { 
        "$set": { "tasks.$.completed": true },
        "$inc": {
            "completed": 1,
            "incomplete": -1
        }
    },
    callback
);

您必须知道当前任务状态才能使计数器更新正常工作,但这很容易编写代码,您可能至少应该在传递给您的方法的对象中包含这些详细信息.

You have to know the current task status though for the counter updates to work correctly, but this is easy to code for and you probably should have at least those details in an object passing into your methods.

就我个人而言,我会重新建模为后一种形式并这样做.您可以执行查询合并",如此处的两个示例所示,但这当然是有代价的.

Personally I would re-model to the latter form and do that. You can do query "merging" as has been shown in two examples here, but it does of course come at a cost.

这篇关于MongoDB 将相关收集项计数与其他收集结果合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆