按日期分组并具有多个日期字段 [英] Group by day with Multiple Date Fields

查看:79
本文介绍了按日期分组并具有多个日期字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样的文档存储在MongoDB中:

I have documents stored into MongoDB like this :

{
    "_id" : "XBpNKbdGSgGfnC2MJ",
    "po" : 72134185,
    "machine" : 40940,
    "location" : "02A01",
    "inDate" : ISODate("2017-07-19T06:10:13.059Z"),
    "requestDate" : ISODate("2017-07-19T06:17:04.901Z"),
    "outDate" : ISODate("2017-07-19T06:30:34Z")
}

我想按日给出inDate的总和 outDate。

And I want give the sum, by day, of inDate and outDate.

我可以在 inDate 之前从两边检索文档总数,然后,另一方面,文档的总和为 outDate ,但我希望每个文档的总和。

I can retrieve of both side the sum of documents by inDate day and, on other side, the sum of documents by outDate, but I would like the sum of each.

当前,我使用以下管道:

Currently, I use this pipeline :

      $group: {
        _id: {
          yearA: { $year: '$inDate' },
          monthA: { $month: '$inDate' },
          dayA: { $dayOfMonth: '$inDate' },
        },
        count: { $sum: 1 },
      },

然后我给出:

{ "_id" : { "year" : 2017, "month" : 7, "day" : 24 }, "count" : 1 }
{ "_id" : { "year" : 2017, "month" : 7, "day" : 21 }, "count" : 11 }
{ "_id" : { "year" : 2017, "month" : 7, "day" : 19 }, "count" : 20 }

但我想如果可能的话

{ "_id" : { "year" : 2017, "month" : 7, "day" : 24 }, "countIn" : 1, "countOut" : 4 }
{ "_id" : { "year" : 2017, "month" : 7, "day" : 21 }, "countIn" : 11, "countOut" : 23 }
{ "_id" : { "year" : 2017, "month" : 7, "day" : 19 }, "countIn" : 20, "countOut" : 18 }

有什么想法吗?
非常感谢:-)

Any idea ? Many thanks :-)

推荐答案

您也可以在源头拆分文档,方法是将每个值本质上合并为按类型表示输入和输出的条目数组。您可以使用 $ map $ cond 选择字段,然后 $ unwind 数组,然后通过使用 $ cond

You can also split the documents at the source, by essentially combining each value into an array of entries by "type" for "in" and "out". You can do this simply using $map and $cond to select the fields, then $unwind the array and then determine which field to "count" again by inspecting with $cond:

collection.aggregate([
  { "$project": {
    "dates": {
      "$filter": {
        "input": { 
          "$map": {
            "input": [ "in", "out" ],
            "as": "type",
            "in": {
              "type": "$$type",
              "date": {
                "$cond": {
                  "if": { "$eq": [ "$$type", "in" ] },
                  "then": "$inDate",
                  "else": "$outDate"
                }
              }
            }
          }
        },
        "as": "dates",
        "cond": { "$ne": [ "$$dates.date", null ] }
      }
    }
  }},
  { "$unwind": "$dates" },
  { "$group": {
    "_id": {
      "year": { "$year": "$dates.date" },
      "month": { "$month": "$dates.date" },
      "day": { "$dayOfMonth": "$dates.date" }
    },
    "countIn": {
      "$sum": {
        "$cond": {
          "if": { "$eq": [ "$dates.type", "in" ]  },
          "then": 1,
          "else": 0
        }
      }
    },
    "countOut": {
      "$sum": {
        "$cond": {
          "if": { "$eq": [ "$dates.type", "out" ]  },
          "then": 1,
          "else": 0
        }
      }
    }
  }}
])

这是一种安全的方法,无论您发送多少数据,都不会冒险突破BSON限制。

That's a safe way to do this that does not risk breaking the BSON limit, no matter what size of data you send at it.

我个人宁愿作为单独的流程运行并分别组合汇总结果,但这取决于您所运行的环境,问题中没有提到。

Personally I would rather run as separate processes and "combine" the aggregated results separately, but that would depend on the environment you are running in, which is not mentioned in the question.

对于并行执行的示例,您可以按照以下方式在流星中进行构造:

For an example of "parallel" execution, you can structure in Meteor somewhere along these lines:

import { Meteor } from 'meteor/meteor';
import { Source } from '../imports/source';
import { Target } from '../imports/target';

Meteor.startup(async () => {
  // code to run on server at startup

  await Source.remove({});
  await Target.remove({});

  console.log('Removed');

  Source.insert({
    "_id" : "XBpNKbdGSgGfnC2MJ",
    "po" : 72134185,
    "machine" : 40940,
    "location" : "02A01",
    "inDate" : new Date("2017-07-19T06:10:13.059Z"),
    "requestDate" : new Date("2017-07-19T06:17:04.901Z"),
    "outDate" : new Date("2017-07-19T06:30:34Z")
  });

  console.log('Inserted');

  await Promise.all(
    ["In","Out"].map( f => new Promise((resolve,reject) => {
      let cursor = Source.rawCollection().aggregate([
        { "$match": { [`${f.toLowerCase()}Date`]: { "$exists": true } } },
        { "$group": {
          "_id": {
            "year": { "$year": `$${f.toLowerCase()}Date` },
            "month": { "$month": `$${f.toLowerCase()}Date` },
            "day": { "$dayOfYear": `$${f.toLowerCase()}Date` }
          },
          [`count${f}`]: { "$sum": 1 }
        }}
      ]);

      cursor.on('data', async (data) => {
        cursor.pause();
        data.date = data._id;
        delete data._id;
        await Target.upsert(
          { date: data.date },
          { "$set": data }
        );
        cursor.resume();
      });

      cursor.on('end', () => resolve('done'));
      cursor.on('error', (err) => reject(err));
    }))
  );

  console.log('Mapped');

  let targets = await Target.find().fetch();
  console.log(targets);

});

从本质上讲,这将输出到目标集合,如注释中所述:

Which is essentially going to output to the target collection as was mentioned in comments like:

{
        "_id" : "XdPGMkY24AcvTnKq7",
        "date" : {
                "year" : 2017,
                "month" : 7,
                "day" : 200
        },
        "countIn" : 1,
        "countOut" : 1
}

这篇关于按日期分组并具有多个日期字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆