在 MongoDB 中按日期与本地时区分组 [英] Group by Date with Local Time Zone in MongoDB

查看:24
本文介绍了在 MongoDB 中按日期与本地时区分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 mongodb 的新手.以下是我的查询.

Model.aggregate().match({ 'activationId': activationId, "t": { "$gte": new Date(fromTime), "$lt": new Date(toTime) } }).group({ '_id': { 'date': { $dateToString: { format: "%Y-%m-%d %H", date: "$datefield" } } }, uniqueCount: { $addToSet: "$mac" } }).project({ "date": 1, "month": 1, "hour": 1, uniqueMacCount: { $size: "$uniqueCount" } }).exec().then(函数(文档){返回文档;});

问题是 mongodb 在 iso 时区存储日期.我需要这些数据来显示面积图.

我想按日期和当地时区分组.有没有办法在分组时将时间偏移添加到日期中?

解决方案

处理本地日期"的一般问题

因此,对此有一个简短的答案,也有一个很长的答案.基本情况是不使用任何"日期聚合运算符" 相反,您更愿意并且需要"实际上对日期对象进行数学运算".这里的主要事情是根据给定本地时区的 UTC 偏移量调整值,然后舍入"到所需的时间间隔.

更长的答案"以及要考虑的主要问题涉及日期通常受夏令时"的影响,在一年中的不同时间与 UTC 的偏移量发生变化.因此,这意味着当出于此类聚合目的转换为本地时间"时,您确实应该考虑此类更改的边界在哪里.

还有另一个考虑因素,即无论您在给定的时间间隔内如何聚合",输出值应该"至少最初以 UTC 形式出现.这是一个很好的做法,因为显示到语言环境"确实是一个客户端功能",正如后面所描述的,客户端界面通常有一种在当前语言环境中显示的方式,这将基于它实际上被馈送的前提UTC 格式的数据.

确定区域设置偏移和夏令时

这通常是需要解决的主要问题.将日期四舍五入"到一个区间的一般数学是简单的部分,但没有真正的数学可以用于了解何时适用此类边界,并且规则在每个地区并且通常每年都会发生变化.

所以这就是库"的用武之地,作者认为 JavaScript 平台的最佳选择是 moment-timezone,它基本上是 moment.js 的超集",包括所有重要的timezeone"我们想要使用的功能.

Moment Timezone 基本上为每个语言环境时区定义了这样的结构:

<代码>{name : 'America/Los_Angeles',//唯一标识符abbrs : ['PDT', 'PST'],//缩写untils : [1414918800000, 1425808800000],//以毫秒为单位的时间戳offsets : [420, 480]//以分钟为单位的偏移量}

当然,这些对象相对于实际记录的 untilsoffsets 属性大得多.但这是您需要访问的数据,以便查看给定夏令时更改的区域的偏移量是否确实发生了变化.

后面代码清单的这个块是我们主要用来确定给定范围的 startend 值,越过哪些夏令时边界,如果有的话:

 const zone = moment.tz.zone(locale);如果 ( zone.hasOwnProperty('untils') ) {让之间 = zone.untils.filter( u =>u >= start.valueOf() &&你<end.valueOf());if ( between.length > 0 )分支 = 之间.map( d => moment.tz(d, locale) ).reduce((acc,curr,i,arr) =>acc.concat((我=== 0)?[{ start, end: curr }]​​ : [{ start: acc[i-1].end, end: curr }]​​,( i === arr.length-1 ) ?[{开始:当前,结束}]:[]),[]);}

纵观 2017 年 Australia/Sydney 语言环境,其输出将是:

<预><代码>[{"start": "2016-12-31T13:00:00.000Z",//这里的时间间隔是 +11 小时结束":2017-04-01T16:00:00.000Z"},{"start": "2017-04-01T16:00:00.000Z",//这里改为+10小时结束":2017-09-30T16:00:00.000Z"},{"start": "2017-09-30T16:00:00.000Z",//这里改回 +11 小时结束":2017-12-31T13:00:00.000Z"}]

这基本上表明,第一个日期序列之间的偏移量将是 +11 小时,然后在第二个序列中的日期之间更改为 +10 小时,然后切换回 +11 小时,以覆盖到结束的时间间隔年份和指定范围.

然后需要将此逻辑转换为 MongoDB 可以理解的结构,作为聚合管道的一部分.

应用数学

此处聚合到任何四舍五入的日期间隔"的数学原理基本上依赖于使用所表示日期的毫秒值,该值四舍五入"到表示所需间隔"的最接近的数字.

您实际上是通过找到应用于所需间隔的当前值的模"或余数"来实现的.然后从当前值中减去"该余数,该值以最近的间隔返回一个值.

例如,给定当前日期:

 var d = new Date("2017-07-14T01:28:34.931Z");//toValue() 是 1499995714931 毫秒//1000 毫秒 * 60 秒 * 60 分钟 = 1 小时或 3600000 毫秒var v = d.valueOf() - ( d.valueOf() % ( 1000 * 60 * 60 ) );//v 等于 1499994000000 毫秒或作为日期新日期(1499994000000);ISODate("2017-07-14T01:00:00Z")//删除了 28 分钟并更改为最接近的 1 小时间隔

这是我们还需要使用 $subtract$mod 运算,它们是用于上述相同数学运算的聚合表达式.

聚合管道的一般结构是:

 让管道 = [{$匹配":{"createdAt": { "$gte": start.toDate(), "$lt": end.toDate() }}},{$组":{_ID": {"$add": [{$减去":[{$减去":[{ "$subtract": [ "$createdAt", new Date(0) ] },switchOffset(start,end,"$createdAt",false)]},{$mod":[{$减去":[{ "$subtract": [ "$createdAt", new Date(0) ] },switchOffset(start,end,"$createdAt",false)]},间隔]}]},新日期(0)]},金额":{$sum":$amount"}}},{$addFields":{_ID": {"$add": ["$_id", switchOffset(start,end,"$_id",true)]}}},{ "$sort": { "_id": 1 } }];

这里您需要了解的主要部分是从存储在 MongoDB 中的 Date 对象到表示内部时间戳值的 Numeric 的转换.我们需要数字"形式,要做到这一点,这是一个数学技巧,我们从另一个 BSON 日期中减去一个 BSON 日期,从而得出它们之间的数字差异.这正是该语句的作用:

{ "$subtract": [ "$createdAt", new Date(0) ] }

现在我们有一个数值要处理,我们可以应用模数并从日期的数字表示中减去它,以便四舍五入"它.所以这个的直接"表示是这样的:

{ "$subtract": [{ "$subtract": [ "$createdAt", new Date(0) ] },{$mod":[{ "$subtract": [ "$createdAt", new Date(0) ] },( 1000 * 60 * 60 * 24 )//24 小时]}]}

它反映了与前面所示相同的 JavaScript 数学方法,但应用于聚合管道中的实际文档值.您还会注意到另一个技巧",我们在那里应用了 $add 操作与 BSON 日期的另一种表示形式(或 0 毫秒),其中将 BSON 日期添加"到数字"值,返回表示毫秒的BSON 日期"它作为输入给出.

当然,列出的代码中的另一个考虑因素是与 UTC 的实际偏移",它正在调整数值以确保对当前时区进行舍入".这是在基于查找不同偏移发生位置的早期描述的函数中实现的,并通过比较输入日期并返回正确的偏移来返回可用于聚合管道表达式的格式.

随着所有细节的全面展开,包括处理那些不同的夏令时"时间偏移的生成,则如下所示:

<预><代码>[{$匹配":{创建于":{"$gte": "2016-12-31T13:00:00.000Z","$lt": "2017-12-31T13:00:00.000Z"}}},{$ 组":{_ID": {"$add": [{$减去":[{$减去":[{$减去":["$createdAt",1970-01-01T00:00:00.000Z"]},{$ 开关":{分支":[{案件": {"$and": [{"$gte": ["$createdAt",2016-12-31T13:00:00.000Z"]},{"$lt": ["$createdAt",2017-04-01T16:00:00.000Z"]}]},然后":-39600000},{案件": {"$and": [{"$gte": ["$createdAt",2017-04-01T16:00:00.000Z"]},{"$lt": ["$createdAt",2017-09-30T16:00:00.000Z"]}]},然后":-36000000},{案件": {"$and": [{"$gte": ["$createdAt",2017-09-30T16:00:00.000Z"]},{"$lt": ["$createdAt",2017-12-31T13:00:00.000Z"]}]},然后":-39600000}]}}]},{"$mod": [{$减去":[{$减去":["$createdAt",1970-01-01T00:00:00.000Z"]},{$ 开关":{分支":[{案件": {"$and": [{"$gte": ["$createdAt",2016-12-31T13:00:00.000Z"]},{"$lt": ["$createdAt",2017-04-01T16:00:00.000Z"]}]},然后":-39600000},{案件": {"$and": [{"$gte": ["$createdAt",2017-04-01T16:00:00.000Z"]},{"$lt": ["$createdAt",2017-09-30T16:00:00.000Z"]}]},然后":-36000000},{案件": {"$and": [{"$gte": ["$createdAt",2017-09-30T16:00:00.000Z"]},{"$lt": ["$createdAt",2017-12-31T13:00:00.000Z"]}]},然后":-39600000}]}}]},86400000]}]},1970-01-01T00:00:00.000Z"]},数量": {"$sum": "$amount"}}},{$addFields":{_ID": {"$add": ["$_id",{$ 开关":{分支":[{案件": {"$and": [{"$gte": ["$_id",2017-01-01T00:00:00.000Z"]},{"$lt": ["$_id",2017-04-02T03:00:00.000Z"]}]},然后":-39600000},{案件": {"$and": [{"$gte": ["$_id",2017-04-02T02:00:00.000Z"]},{"$lt": ["$_id",2017-10-01T02:00:00.000Z"]}]},然后":-36000000},{案件": {"$and": [{"$gte": ["$_id",2017-10-01T03:00:00.000Z"]},{"$lt": ["$_id",2018-01-01T00:00:00.000Z"]}]},然后":-39600000}]}}]}}},{$排序":{_id":1}}]

该扩展使用了$switch 语句,以便将日期范围作为条件应用于何时返回给定的偏移值.这是最方便的形式,因为 "branches" 参数确实直接对应于数组",这是通过检查 untils 表示在提供的查询日期范围内给定时区的偏移切点".

可以使用 $cond 代替,但是实现起来比较麻烦,所以我们这里只是使用最方便的方法实现.

一旦应用了所有这些条件,聚合"日期实际上就是那些表示由提供的 locale 定义的本地"时间.这实际上让我们了解了最终的聚合阶段是什么,它存在的原因以及后面的处理,如清单中所示.

最终结果

我之前确实提到过,一般建议是输出"仍应以至少某些描述的 UTC 格式返回日期值,因此这正是此处的管道通过首先转换来自"UTC 所做的工作通过在四舍五入"时应用偏移量到本地,但随后分组后"的最终数字被重新调整回适用于四舍五入"日期值的相同偏移量.

这里的清单给出了三种"不同的输出可能性:

//JSON stringify 默认的 ISO 格式字符串[{"_id": "2016-12-31T13:00:00.000Z",数量":2},{"_id": "2017-01-01T13:00:00.000Z",数量":1},{"_id": "2017-01-02T13:00:00.000Z",数量":2}]//时间戳值 - 距 UTC 纪元的毫秒数 - 最少空间![{_id":1483189200000,数量":2},{_id":1483275600000,数量":1},{_id":1483362000000,数量":2}]//通过 moment .format() 强制语言环境格式为字符串[{"_id": "2017-01-01T00:00:00+11:00",数量":2},{"_id": "2017-01-02T00:00:00+11:00",数量":1},{"_id": "2017-01-03T00:00:00+11:00",数量":2}]

这里需要注意的一件事是,对于像 Angular 这样的客户端",这些格式中的每一种都将被它自己的 DatePipe 它实际上可以为您执行区域设置格式".但这取决于提供数据的位置.好"库会知道在当前语言环境中使用 UTC 日期.如果情况并非如此,那么您可能需要自己字符串化".

但这是一件简单的事情,您可以通过使用一个库来获得最大的支持,该库本质上是基于对给定 UTC 值"的输出的操作.

当您询问诸如聚合到本地时区之类的问题时,这里的主要内容是了解您在做什么".这样的过程应该考虑:

  1. 可以并且经常从不同时区的人的角度查看数据.

  2. 数据一般由不同时区的人提供.结合第 1 点,这就是我们以 UTC 存储的原因.

  3. 时区在许多世界时区中通常会受到夏令时"不断变化的偏移",您应该在分析和处理数据时考虑到这一点.

  4. 不管聚合间隔如何,输出实际上应该"保持在 UTC,尽管根据提供的区域设置调整为聚合间隔.这使得演示文稿被委托给客户端"功能,就像它应该的那样.

只要您牢记这些事情并像这里的清单中展示的那样应用,那么您就是在做所有正确的事情来处理与给定区域设置相关的日期聚合甚至一般存储.

所以你应该"这样做,而你不应该"做的是放弃并简单地将语言环境日期"存储为一个字符串.如上所述,这将是一种非常不正确的方法,只会给您的应用程序带来更多问题.

<块引用>

注意:我在这里根本没有涉及的一个主题是聚合到月"(或实际上是年") 间隔.月"是整个过程中的数学异常,因为天数总是变化的,因此需要一套完整的其他逻辑才能应用.单独描述这一点至少和这篇文章一样长,因此将是另一个主题.对于常见的分钟、小时和天数,这里的数学计算对于这些情况来说足够好".

<小时>

完整列表

这可以作为修补的演示".它使用所需的函数来提取要包含的偏移日期和值,并在提供的数据上运行聚合管道.

您可以在此处更改任何内容,但可能会以 localeinterval 参数开头,然后可能添加不同的数据和不同的 startend 查询的日期.但其余代码无需更改即可简单地更改任何这些值,因此可以使用不同的时间间隔(例如问题中询问的 1 小时)和不同的语言环境进行演示.

例如,一旦提供了实际上需要以1 小时间隔"聚合的有效数据,则列表中的行将更改为:

const interval = moment.duration(1,'hour').asMilliseconds();

为了根据对日期执行的聚合操作的要求,为聚合间隔定义毫秒值.

<小时>

const moment = require('moment-timezone'),猫鼬 = 要求('猫鼬'),架构 = 猫鼬.架构;猫鼬.Promise = global.Promise;mongoose.set('debug',true);const uri = 'mongodb://localhost/test',选项 = { useMongoClient: true };const locale = '澳大利亚/悉尼';const interval = moment.duration(1,'day').asMilliseconds();const 报告架构 = 新架构({createdAt:日期,数量:数量});const Report = mongoose.model('Report', reportSchema);功能日志(数据){console.log(JSON.stringify(data,undefined,2))}函数 switchOffset(start,end,field,reverseOffset) {让分支 = [{ 开始,结束 }]const zone = moment.tz.zone(locale);如果 ( zone.hasOwnProperty('untils') ) {让之间 = zone.untils.filter( u =>u >= start.valueOf() &&你<end.valueOf());if ( between.length > 0 )分支 = 之间.map( d => moment.tz(d, locale) ).reduce((acc,curr,i,arr) =>acc.concat((我=== 0)?[{ start, end: curr }]​​ : [{ start: acc[i-1].end, end: curr }]​​,( i === arr.length-1 ) ?[{开始:当前,结束}]:[]),[]);}日志(分支);分支 = 分支.map( d => ({案件: {$和:[{ $gte: [场地,新日期(d.start.valueOf()+ ((反向偏移)?moment.duration(d.start.utcOffset(),'minutes').asMilliseconds(): 0))]},{ $lt: [场地,新日期(d.end.valueOf()+ ((反向偏移)?moment.duration(d.start.utcOffset(),'minutes').asMilliseconds(): 0))]}]},然后:-1 * moment.duration(d.start.utcOffset(),'minutes').asMilliseconds()}));返回 ({ $switch: { 分支 } });}(异步函数(){尝试 {const conn = await mongoose.connect(uri,options);//数据清理等待 Promise.all(Object.keys(conn.models).map(m => conn.models[m].remove({})));让插入 = 等待 Report.insertMany([{ createdAt: moment.tz("2017-01-01",locale), 数量:1 },{ createdAt: moment.tz("2017-01-01",locale), 数量:1 },{ createdAt: moment.tz("2017-01-02",locale), 数量:1 },{ createdAt: moment.tz("2017-01-03",locale), 数量:1 },{ createdAt: moment.tz("2017-01-03",locale), 数量:1 },]);日志(插入);const start = moment.tz("2017-01-01", locale)end = moment.tz("2018-01-01", locale)让管道 = [{$匹配":{"createdAt": { "$gte": start.toDate(), "$lt": end.toDate() }}},{$组":{_ID": {"$add": [{$减去":[{$减去":[{ "$subtract": [ "$createdAt", new Date(0) ] },switchOffset(start,end,"$createdAt",false)]},{$mod":[{$减去":[{ "$subtract": [ "$createdAt", new Date(0) ] },switchOffset(start,end,"$createdAt",false)]},间隔]}]},新日期(0)]},金额":{$sum":$amount"}}},{$addFields":{_ID": {"$add": ["$_id", switchOffset(start,end,"$_id",true)]}}},{ "$sort": { "_id": 1 } }];日志(管道);让结果 = 等待 Report.aggregate(pipeline);//记录原始日期对象,将在 JSON 中字符串化为 UTC日志(结果);//我喜欢输出时间戳值,让客户端格式化结果 = 结果.map( d =>Object.assign(d, { _id: d._id.valueOf() }));日志(结果);//或者使用 moment 将语言环境的输出格式化为字符串结果 = 结果.map( d =>Object.assign(d, { _id: moment.tz(d._id, locale).format() } ));日志(结果);}赶上(e){控制台错误(e);} 最后 {猫鼬.disconnect();}})()

I am new to mongodb. Below is my query.

Model.aggregate()
            .match({ 'activationId': activationId, "t": { "$gte": new Date(fromTime), "$lt": new Date(toTime) } })
            .group({ '_id': { 'date': { $dateToString: { format: "%Y-%m-%d %H", date: "$datefield" } } }, uniqueCount: { $addToSet: "$mac" } })
            .project({ "date": 1, "month": 1, "hour": 1, uniqueMacCount: { $size: "$uniqueCount" } })
            .exec()
            .then(function (docs) {
                return docs;
            });

The issue is mongodb stores date in iso timezone. I need this data for displaying area chart.

I want to group by date with local time zone. is there any way to add timeoffset into date when group by?

解决方案

General Problem of Dealing with "local dates"

So there is a short answer to this and a long answer as well. The basic case is that instead of using any of the "date aggregation operators" you instead rather want to and "need to" actually "do the math" on the date objects instead. The primary thing here is to adjust the values by the offset from UTC for the given local timezone and then "round" to the required interval.

The "much longer answer" and also the main problem to consider involves that dates are often subject to "Daylight Savings Time" changes in the offset from UTC at different times of the year. So this means that when converting to "local time" for such aggregation purposes, you really should consider where the boundaries for such changes exist.

There is also another consideration, being that no matter what you do to "aggregate" at a given interval, the output values "should" at least initially come out as UTC. This is good practice since display to "locale" really is a "client function", and as later described, the client interfaces will commonly have a way of displaying in the present locale which will be based on the premise that it was in fact fed data as UTC.

Determining Locale Offset and Daylight Savings

This is generally the main problem that needs to be solved. The general math for "rounding" a date to an interval is the simple part, but there is no real math you can apply to knowing when such boundaries apply, and the rules change in every locale and often every year.

So this is where a "library" comes in, and the best option here in the authors opinion for a JavaScript platform is moment-timezone, which is basically a "superset" of moment.js including all the important "timezeone" features we want to use.

Moment Timezone basically defines such a structure for each locale timezone as:

{
    name    : 'America/Los_Angeles',          // the unique identifier
    abbrs   : ['PDT', 'PST'],                 // the abbreviations
    untils  : [1414918800000, 1425808800000], // the timestamps in milliseconds
    offsets : [420, 480]                      // the offsets in minutes
}

Where of course the objects are much larger with respect to the untils and offsets properties actually recorded. But that is the data you need to access in order to see if there is actually a change in the offset for a zone given daylight savings changes.

This block of the later code listing is what we basically use to determine given a start and end value for a range, which daylight savings boundaries are crossed, if any:

  const zone = moment.tz.zone(locale);
  if ( zone.hasOwnProperty('untils') ) {
    let between = zone.untils.filter( u =>
      u >= start.valueOf() && u < end.valueOf()
    );
    if ( between.length > 0 )
      branches = between
        .map( d => moment.tz(d, locale) )
        .reduce((acc,curr,i,arr) =>
          acc.concat(
            ( i === 0 )
              ? [{ start, end: curr }] : [{ start: acc[i-1].end, end: curr }],
            ( i === arr.length-1 ) ? [{ start: curr, end }] : []
          )
        ,[]);
  }

Looking at the whole of 2017 for the Australia/Sydney locale the output of this would be:

[
  {
    "start": "2016-12-31T13:00:00.000Z",    // Interval is +11 hours here
    "end": "2017-04-01T16:00:00.000Z"
  },
  {
    "start": "2017-04-01T16:00:00.000Z",    // Changes to +10 hours here
    "end": "2017-09-30T16:00:00.000Z"
  },
  {
    "start": "2017-09-30T16:00:00.000Z",    // Changes back to +11 hours here
    "end": "2017-12-31T13:00:00.000Z"
  }
]

Which basically reveals that between the first sequence of dates the offset would be +11 hours then changes to +10 hours between the dates in the second sequence and then switches back to +11 hours for the interval covering to the end of the year and the specified range.

This logic then needs to be translated into a structure that will be understood by MongoDB as part of an aggregation pipeline.

Applying the Math

The mathematical principle here for aggregating to any "rounded date interval" essentially relies on using the milliseconds value of the represented date which is "rounded" down to the nearest number representing the "interval" required.

You essentially do this by finding the "modulo" or "remainder" of the current value applied to the required interval. Then you "subtract" that remainder from the current value which returns a value at the nearest interval.

For example, given the current date:

  var d = new Date("2017-07-14T01:28:34.931Z"); // toValue() is 1499995714931 millis
  // 1000 millseconds * 60 seconds * 60 minutes = 1 hour or 3600000 millis
  var v = d.valueOf() - ( d.valueOf() % ( 1000 * 60 * 60 ) );
  // v equals 1499994000000 millis or as a date
  new Date(1499994000000);
  ISODate("2017-07-14T01:00:00Z") 
  // which removed the 28 minutes and change to nearest 1 hour interval

This is the general math we also need to apply in the aggregation pipeline using the $subtract and $mod operations, which are the aggregation expressions used for the same math operations shown above.

The general structure of the aggregation pipeline is then:

    let pipeline = [
      { "$match": {
        "createdAt": { "$gte": start.toDate(), "$lt": end.toDate() }
      }},
      { "$group": {
        "_id": {
          "$add": [
            { "$subtract": [
              { "$subtract": [
                { "$subtract": [ "$createdAt", new Date(0) ] },
                switchOffset(start,end,"$createdAt",false)
              ]},
              { "$mod": [
                { "$subtract": [
                  { "$subtract": [ "$createdAt", new Date(0) ] },
                  switchOffset(start,end,"$createdAt",false)
                ]},
                interval
              ]}
            ]},
            new Date(0)
          ]
        },
        "amount": { "$sum": "$amount" }
      }},
      { "$addFields": {
        "_id": {
          "$add": [
            "$_id", switchOffset(start,end,"$_id",true)
          ]
        }
      }},
      { "$sort": { "_id": 1 } }
    ];

The main parts here you need to understand is the conversion from a Date object as stored in MongoDB to Numeric representing the internal timestamp value. We need the "numeric" form, and to do this is a trick of math where we subtract one BSON Date from another which yields the numeric difference between them. This is exactly what this statement does:

{ "$subtract": [ "$createdAt", new Date(0) ] }

Now we have a numeric value to deal with, we can apply the modulo and subtract that from the numeric representation of the date in order to "round" it. So the "straight" representation of this is like:

{ "$subtract": [
  { "$subtract": [ "$createdAt", new Date(0) ] },
  { "$mod": [
    { "$subtract": [ "$createdAt", new Date(0) ] },
    ( 1000 * 60 * 60 * 24 ) // 24 hours
  ]}
]}

Which mirrors the same JavaScript math approach as shown earlier but applied to the actual document values in the aggregation pipeline. You will also note the other "trick" there where we apply an $add operation with another representation of a BSON date as of epoch ( or 0 milliseconds ) where the "addition" of a BSON Date to a "numeric" value, returns a "BSON Date" representing the milliseconds it was given as input.

Of course the other consideration in the listed code it the actual "offset" from UTC which is adjusting the numeric values in order to ensure the "rounding" takes place for the present timezone. This is implemented in a function based on the earlier description of finding where the different offsets occur, and returns a format as usable in an aggregation pipeline expression by comparing the input dates and returning the correct offset.

With the full expansion of all the details, including the generation of handling those different "Daylight Savings" time offsets would then be like:

[
  {
    "$match": {
      "createdAt": {
        "$gte": "2016-12-31T13:00:00.000Z",
        "$lt": "2017-12-31T13:00:00.000Z"
      }
    }
  },
  {
    "$group": {
      "_id": {
        "$add": [
          {
            "$subtract": [
              {
                "$subtract": [
                  {
                    "$subtract": [
                      "$createdAt",
                      "1970-01-01T00:00:00.000Z"
                    ]
                  },
                  {
                    "$switch": {
                      "branches": [
                        {
                          "case": {
                            "$and": [
                              {
                                "$gte": [
                                  "$createdAt",
                                  "2016-12-31T13:00:00.000Z"
                                ]
                              },
                              {
                                "$lt": [
                                  "$createdAt",
                                  "2017-04-01T16:00:00.000Z"
                                ]
                              }
                            ]
                          },
                          "then": -39600000
                        },
                        {
                          "case": {
                            "$and": [
                              {
                                "$gte": [
                                  "$createdAt",
                                  "2017-04-01T16:00:00.000Z"
                                ]
                              },
                              {
                                "$lt": [
                                  "$createdAt",
                                  "2017-09-30T16:00:00.000Z"
                                ]
                              }
                            ]
                          },
                          "then": -36000000
                        },
                        {
                          "case": {
                            "$and": [
                              {
                                "$gte": [
                                  "$createdAt",
                                  "2017-09-30T16:00:00.000Z"
                                ]
                              },
                              {
                                "$lt": [
                                  "$createdAt",
                                  "2017-12-31T13:00:00.000Z"
                                ]
                              }
                            ]
                          },
                          "then": -39600000
                        }
                      ]
                    }
                  }
                ]
              },
              {
                "$mod": [
                  {
                    "$subtract": [
                      {
                        "$subtract": [
                          "$createdAt",
                          "1970-01-01T00:00:00.000Z"
                        ]
                      },
                      {
                        "$switch": {
                          "branches": [
                            {
                              "case": {
                                "$and": [
                                  {
                                    "$gte": [
                                      "$createdAt",
                                      "2016-12-31T13:00:00.000Z"
                                    ]
                                  },
                                  {
                                    "$lt": [
                                      "$createdAt",
                                      "2017-04-01T16:00:00.000Z"
                                    ]
                                  }
                                ]
                              },
                              "then": -39600000
                            },
                            {
                              "case": {
                                "$and": [
                                  {
                                    "$gte": [
                                      "$createdAt",
                                      "2017-04-01T16:00:00.000Z"
                                    ]
                                  },
                                  {
                                    "$lt": [
                                      "$createdAt",
                                      "2017-09-30T16:00:00.000Z"
                                    ]
                                  }
                                ]
                              },
                              "then": -36000000
                            },
                            {
                              "case": {
                                "$and": [
                                  {
                                    "$gte": [
                                      "$createdAt",
                                      "2017-09-30T16:00:00.000Z"
                                    ]
                                  },
                                  {
                                    "$lt": [
                                      "$createdAt",
                                      "2017-12-31T13:00:00.000Z"
                                    ]
                                  }
                                ]
                              },
                              "then": -39600000
                            }
                          ]
                        }
                      }
                    ]
                  },
                  86400000
                ]
              }
            ]
          },
          "1970-01-01T00:00:00.000Z"
        ]
      },
      "amount": {
        "$sum": "$amount"
      }
    }
  },
  {
    "$addFields": {
      "_id": {
        "$add": [
          "$_id",
          {
            "$switch": {
              "branches": [
                {
                  "case": {
                    "$and": [
                      {
                        "$gte": [
                          "$_id",
                          "2017-01-01T00:00:00.000Z"
                        ]
                      },
                      {
                        "$lt": [
                          "$_id",
                          "2017-04-02T03:00:00.000Z"
                        ]
                      }
                    ]
                  },
                  "then": -39600000
                },
                {
                  "case": {
                    "$and": [
                      {
                        "$gte": [
                          "$_id",
                          "2017-04-02T02:00:00.000Z"
                        ]
                      },
                      {
                        "$lt": [
                          "$_id",
                          "2017-10-01T02:00:00.000Z"
                        ]
                      }
                    ]
                  },
                  "then": -36000000
                },
                {
                  "case": {
                    "$and": [
                      {
                        "$gte": [
                          "$_id",
                          "2017-10-01T03:00:00.000Z"
                        ]
                      },
                      {
                        "$lt": [
                          "$_id",
                          "2018-01-01T00:00:00.000Z"
                        ]
                      }
                    ]
                  },
                  "then": -39600000
                }
              ]
            }
          }
        ]
      }
    }
  },
  {
    "$sort": {
      "_id": 1
    }
  }
]

That expansion is using the $switch statement in order to apply the date ranges as conditions to when to return the given offset values. This is the most convenient form since the "branches" argument does correspond directly to an "array", which is the most convenient output of the "ranges" determined by examination of the untils representing the offset "cut-points" for the given timezone on the supplied date range of the query.

It is possible to apply the same logic in earlier versions of MongoDB using a "nested" implementation of $cond instead, but it is a little messier to implement, so we are just using the most convenient method in implementation here.

Once all of those conditions are applied, the dates "aggregated" are actually those representing the "local" time as defined by the supplied locale. This actually brings us to what the final aggregation stage is, and the reason why it is there as well as the later handling as demonstrated in the listing.

End Results

I did mention earlier that the general recommendation is that the "output" should still return the date values in UTC format of at least some description, and therefore that is exactly what the pipeline here is doing by first converting "from" UTC to local by applying the offset when "rounding", but then the final numbers "after the grouping" are re-adjusted back by the same offset that applies to the "rounded" date values.

The listing here gives "three" different output possibilities here as:

// ISO Format string from JSON stringify default
[
  {
    "_id": "2016-12-31T13:00:00.000Z",
    "amount": 2
  },
  {
    "_id": "2017-01-01T13:00:00.000Z",
    "amount": 1
  },
  {
    "_id": "2017-01-02T13:00:00.000Z",
    "amount": 2
  }
]
// Timestamp value - milliseconds from epoch UTC - least space!
[
  {
    "_id": 1483189200000,
    "amount": 2
  },
  {
    "_id": 1483275600000,
    "amount": 1
  },
  {
    "_id": 1483362000000,
    "amount": 2
  }
]

// Force locale format to string via moment .format()
[
  {
    "_id": "2017-01-01T00:00:00+11:00",
    "amount": 2
  },
  {
    "_id": "2017-01-02T00:00:00+11:00",
    "amount": 1
  },
  {
    "_id": "2017-01-03T00:00:00+11:00",
    "amount": 2
  }
]

The one thing of note here is that for a "client" such as Angular, every single one of those formats would be accepted by it's own DatePipe which can actually do the "locale format" for you. But it depends on where the data is supplied to. "Good" libraries will be aware of using a UTC date in the present locale. Where that is not the case, then you might need to "stringify" yourself.

But it is a simple thing, and you get the most support for this by using a library which essentially bases it's manipulation of output from a "given UTC value".

The main thing here is to "understand what you are doing" when you ask such a thing as aggregating to a local time zone. Such a process should consider:

  1. The data can be and often is viewed from the perspective of people within different timezones.

  2. The data is generally provided by people in different timezones. Combined with point 1, this is why we store in UTC.

  3. Timezones are often subject to a changing "offset" from "Daylight Savings Time" in many of the world timezones, and you should account for that when analyzing and processing the data.

  4. Regardless of aggregation intervals, output "should" in fact remain in UTC, albeit adjusted to aggregate on interval according to the locale provided. This leaves presentation to be delegated to a "client" function, just as it should.

As long as you keep those things in mind and apply just like the listing here demonstrates, then you are doing all the right things for dealing with aggregation of dates and even general storage with respect to a given locale.

So you "should" be doing this, and what you "should not" be doing is giving up and simply storing the "locale date" as a string. As described, that would be a very incorrect approach and causes nothing but further problems for your application.

NOTE: The one topic I do not touch on here at all is aggregating to a "month" ( or indeed "year" ) interval. "Months" are the mathematical anomaly in the whole process since the number of days always varies and thus requires a whole other set of logic in order to apply. Describing that alone is at least as long as this post, and therefore would be another subject. For general minutes, hours, and days which is the common case, the math here is "good enough" for those cases.


Full Listing

This serves as a "demonstration" to tinker with. It employs the required function to extract the offset dates and values to be included and runs an aggregation pipeline over the supplied data.

You can change anything in here, but will probably start with the locale and interval parameters, and then maybe add different data and different start and end dates for the query. But the rest of the code need not be changed to simply make changes to any of those values, and can therefore demonstrate using different intervals ( such as 1 hour as asked in the question ) and different locales.

For instance, once supplying valid data which would actually require aggregation at a "1 hour interval" then the line in the listing would be changed as:

const interval = moment.duration(1,'hour').asMilliseconds();

In order to define a milliseconds value for the aggregation interval as required by the aggregation operations being performed on the dates.


const moment = require('moment-timezone'),
      mongoose = require('mongoose'),
      Schema = mongoose.Schema;

mongoose.Promise = global.Promise;
mongoose.set('debug',true);

const uri = 'mongodb://localhost/test',
      options = { useMongoClient: true };

const locale = 'Australia/Sydney';
const interval = moment.duration(1,'day').asMilliseconds();

const reportSchema = new Schema({
  createdAt: Date,
  amount: Number
});

const Report = mongoose.model('Report', reportSchema);

function log(data) {
  console.log(JSON.stringify(data,undefined,2))
}

function switchOffset(start,end,field,reverseOffset) {

  let branches = [{ start, end }]

  const zone = moment.tz.zone(locale);
  if ( zone.hasOwnProperty('untils') ) {
    let between = zone.untils.filter( u =>
      u >= start.valueOf() && u < end.valueOf()
    );
    if ( between.length > 0 )
      branches = between
        .map( d => moment.tz(d, locale) )
        .reduce((acc,curr,i,arr) =>
          acc.concat(
            ( i === 0 )
              ? [{ start, end: curr }] : [{ start: acc[i-1].end, end: curr }],
            ( i === arr.length-1 ) ? [{ start: curr, end }] : []
          )
        ,[]);
  }

  log(branches);

  branches = branches.map( d => ({
    case: {
      $and: [
        { $gte: [
          field,
          new Date(
            d.start.valueOf()
            + ((reverseOffset)
              ? moment.duration(d.start.utcOffset(),'minutes').asMilliseconds()
              : 0)
          )
        ]},
        { $lt: [
          field,
          new Date(
            d.end.valueOf()
            + ((reverseOffset)
              ? moment.duration(d.start.utcOffset(),'minutes').asMilliseconds()
              : 0)
          )
        ]}
      ]
    },
    then: -1 * moment.duration(d.start.utcOffset(),'minutes').asMilliseconds()
  }));

  return ({ $switch: { branches } });

}

(async function() {
  try {
    const conn = await mongoose.connect(uri,options);

    // Data cleanup
    await Promise.all(
      Object.keys(conn.models).map( m => conn.models[m].remove({}))
    );

    let inserted = await Report.insertMany([
      { createdAt: moment.tz("2017-01-01",locale), amount: 1 },
      { createdAt: moment.tz("2017-01-01",locale), amount: 1 },
      { createdAt: moment.tz("2017-01-02",locale), amount: 1 },
      { createdAt: moment.tz("2017-01-03",locale), amount: 1 },
      { createdAt: moment.tz("2017-01-03",locale), amount: 1 },
    ]);

    log(inserted);

    const start = moment.tz("2017-01-01", locale)
          end   = moment.tz("2018-01-01", locale)

    let pipeline = [
      { "$match": {
        "createdAt": { "$gte": start.toDate(), "$lt": end.toDate() }
      }},
      { "$group": {
        "_id": {
          "$add": [
            { "$subtract": [
              { "$subtract": [
                { "$subtract": [ "$createdAt", new Date(0) ] },
                switchOffset(start,end,"$createdAt",false)
              ]},
              { "$mod": [
                { "$subtract": [
                  { "$subtract": [ "$createdAt", new Date(0) ] },
                  switchOffset(start,end,"$createdAt",false)
                ]},
                interval
              ]}
            ]},
            new Date(0)
          ]
        },
        "amount": { "$sum": "$amount" }
      }},
      { "$addFields": {
        "_id": {
          "$add": [
            "$_id", switchOffset(start,end,"$_id",true)
          ]
        }
      }},
      { "$sort": { "_id": 1 } }
    ];

    log(pipeline);
    let results = await Report.aggregate(pipeline);

    // log raw Date objects, will stringify as UTC in JSON
    log(results);

    // I like to output timestamp values and let the client format
    results = results.map( d =>
      Object.assign(d, { _id: d._id.valueOf() })
    );
    log(results);

    // Or use moment to format the output for locale as a string
    results = results.map( d =>
      Object.assign(d, { _id: moment.tz(d._id, locale).format() } )
    );
    log(results);

  } catch(e) {
    console.error(e);
  } finally {
    mongoose.disconnect();
  }
})()

这篇关于在 MongoDB 中按日期与本地时区分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆