MongoDB .NET驱动程序按时间范围分组 [英] MongoDB .NET Driver Group By Time Range

查看:56
本文介绍了MongoDB .NET驱动程序按时间范围分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是MongoDB的菜鸟,想知道如何计算从格林尼治标准时间12 AM到当前UTC时间每15分钟间隔插入集合中的文档总数.

I am a noob in MongoDB and wanted to know how to count total documents inserted into the collection for every 15 minutes interval starting 12 AM UTC until the current UTC time.

下面是一个示例文档

{
    "_id" : ObjectId("5ade8bfc6b941c7726a54f01"),
    "Country" : "US"
    "Timestamp" : ISODate("2018-04-24T01:44:28.040Z"),
}

这是预期的输出:

{
    "Count": 245,
    "ReceiveDateString": "5/2/2018 12:00:00 AM"
},
{
    "Count": 239,
    "ReceiveDateString": "5/2/2018 12:15:00 AM"
},
{
    "Count": 252,
    "ReceiveDateString": "5/2/2018 12:30:00 AM"
},
{
    "Count": 255,
    "ReceiveDateString": "5/2/2018 12:45:00 AM"
},
{
    "Count": 242,
    "ReceiveDateString": "5/2/2018 1:00:00 AM"
}
.
.
.

and so on until current UTC time.

我可以按分钟分组,如下所示:

I can group by minutes as follows:

var filter = Builders<Model>.Filter.Where(r => r.Timestamp > startDate && r.Timestamp < endDate);
var result = Collection.Aggregate()
           .Match(filter)
           .Group(
               r => r.Timestamp.Minute,
               g => new
               {
                   ReceiveDate = g.Select(x => x.Timestamp).First(),
                   Count = g.Count(),
               }
           ).ToEnumerable();

但是,我无法弄清楚如何翻译

However, I am not able to figure out how to translate the solution provided in Group result by 15 minutes time interval in MongoDb to the MongoDB C# driver query.

谢谢.

推荐答案

如果您将确切的东西"视为

If you are looking the the "exact thing" as the referenced post to do with .NET, then it probably isn't actually going to get implemented like that. You can do that, but you're probably not going to go to all the bother and actually go for one of the other alternatives, unless you have a need for "flexible intervals" to the extent that I do..

如果您拥有现代的MongoDB 3.6或更高版本的服务器,则可以使用 $dateFromParts ,以便从从日期中提取的四舍五入"部分中重建日期:

If you have a modern MongoDB 3.6 or greater server available then you can use $dateFromParts in order to reconstruct the date from the "rounded" parts extracted from the date:

DateTime startDate = new DateTime(2018, 5, 1, 0, 0, 0, DateTimeKind.Utc);
DateTime endDate = new DateTime(2018, 6, 1, 0, 0, 0, DateTimeKind.Utc);

var result = Collection.Aggregate()
  .Match(k => k.Timestamp >= startDate && k.Timestamp < endDate)
  .Group(k =>
    new DateTime(k.Timestamp.Year, k.Timestamp.Month, k.Timestamp.Day,
        k.Timestamp.Hour, k.Timestamp.Minute - (k.Timestamp.Minute % 15), 0),
    g => new { _id = g.Key, count = g.Count() }
  )
  .SortBy(d => d._id)
  .ToList();

已发送到服务器的声明:

Statement sent to server:

[
  { "$match" : {
    "Timestamp" : {
      "$gte" : ISODate("2018-05-01T00:00:00Z"),
      "$lt" : ISODate("2018-06-01T00:00:00Z")
    }
  } },
  { "$group" : {
    "_id" : { 
      "$dateFromParts" : {
        "year" : { "$year" : "$Timestamp" },
        "month" : { "$month" : "$Timestamp" },
        "day" : { "$dayOfMonth" : "$Timestamp" },
        "hour" : { "$hour" : "$Timestamp" },
        "minute" : { "$subtract" : [
          { "$minute" : "$Timestamp" },
          { "$mod" : [ { "$minute" : "$Timestamp" }, 15 ] }
        ] },
        "second" : 0
      }
    },
    "count" : { "$sum" : 1 }
  } },
  { "$sort": { "_id": 1 } }
]

如果没有该功能,则可以简单地将其保留,并保留反汇编"日期,然后在处理游标时将其重新组装.只是为了模拟一个列表:

If you don't have that feature available, then you can simply leave that off and leave the date "disassembled", but then assemble it again as you process the cursor. Just to simulate with a list:

var result = Collection.Aggregate()
 .Match(k => k.Timestamp >= startDate && k.Timestamp < endDate)
 .Group(k => new
    {
      year = k.Timestamp.Year,
      month = k.Timestamp.Month,
      day = k.Timestamp.Day,
      hour = k.Timestamp.Hour,
      minute = k.Timestamp.Minute - (k.Timestamp.Minute % 15)
    },
    g => new { _id = g.Key, count = g.Count() }
  )
  .SortBy(d => d._id)
  .ToList();

foreach (var doc in result)
{
  //System.Console.WriteLine(doc.ToBsonDocument());
  System.Console.WriteLine(
    new BsonDocument {
      { "_id", new DateTime(doc._id.year, doc._id.month, doc._id.day,
        doc._id.hour, doc._id.minute, 0) },
      { "count", doc.count }
    }
  );
}

已发送到服务器的声明:

Statement sent to the server:

[
  { "$match" : {
    "Timestamp" : {
      "$gte" : ISODate("2018-05-01T00:00:00Z"),
      "$lt" : ISODate("2018-06-01T00:00:00Z")
    }
  } },
  { "$group" : {
    "_id" : {
      "year" : { "$year" : "$Timestamp" },
      "month" : { "$month" : "$Timestamp" },
      "day" : { "$dayOfMonth" : "$Timestamp" },
      "hour" : { "$hour" : "$Timestamp" },
      "minute" : { "$subtract" : [
        { "$minute" : "$Timestamp" }, 
        { "$mod" : [ { "$minute" : "$Timestamp" }, 15 ] }
      ] }
    },
    "count" : { "$sum" : 1 }
  } },
  { "$sort" : { "_id" : 1 } }
]

两者在代码方面几乎没有什么区别.只是在一种情况下,使用 $dateFromParts ,而在另一种情况下,我们在迭代每个游标结果时使用代码中的DateTime构造函数进行了完全相同的转换.

There's very little difference between the two in terms of the code. It's just that in one case the "casting back" to DateTime actually happens on the server with the $dateFromParts and in the other we just do that exact same casting using the DateTime constructor in code as you iterate each cursor result.

因此,它们实际上是几乎相同的,唯一的不同是服务器"强制转换返回的日期,每个文档使用的字节数要少得多.实际上,因为这里的所有数字格式(包括BSON Date)都是基于64位整数,所以减少了"5倍".即便如此,所有这些数字实际上仍然比发送回日期的任何字符串"表示形式轻巧".

So they really are almost the same with the only real difference being where the "server" does the casting the date returned uses a lot less bytes per document. In fact "5 times" less since all numeric formats here ( including the BSON Date ) are based on 64 bit integers. Even so, all of those numbers are still actually "lighter" than sending back any "string" representation of a date.

这些是基本形式,当映射到这些不同形式时,它们实际上保持不变:

Those are the basic forms which really stay the same when mapping onto these different forms:

var query = from p in Collection.AsQueryable()
            where p.Timestamp >= startDate && p.Timestamp < endDate
            group p by new DateTime(p.Timestamp.Year, p.Timestamp.Month, p.Timestamp.Day,
              p.Timestamp.Hour, p.Timestamp.Minute - (p.Timestamp.Minute % 15), 0) into g
            orderby g.Key
            select new { _id = g.Key, count = g.Count() };

已发送到服务器的声明:

Statement sent to the server:

[
  { "$match" : {
    "Timestamp" : {
      "$gte" : ISODate("2018-05-01T00:00:00Z"),
      "$lt" : ISODate("2018-06-01T00:00:00Z")
    }
  } },
  { "$group" : {
    "_id" : {
      "$dateFromParts" : {
        "year" : { "$year" : "$Timestamp" }, 
        "month" : { "$month" : "$Timestamp" },
        "day" : { "$dayOfMonth" : "$Timestamp" }, 
        "hour" : { "$hour" : "$Timestamp" }, 
        "minute" : { "$subtract" : [
          { "$minute" : "$Timestamp" },
          { "$mod" : [ { "$minute" : "$Timestamp" }, 15 ] }
        ] },
        "second" : 0
      }
    },
    "__agg0" : { "$sum" : 1 }
  } },
  { "$sort" : { "_id" : 1 } },
  { "$project" : { "_id" : "$_id", "count" : "$__agg0" } }
]

或使用GroupBy()

var query = Collection.AsQueryable()
    .Where(k => k.Timestamp >= startDate && k.Timestamp < endDate)
    .GroupBy(k =>
      new DateTime(k.Timestamp.Year, k.Timestamp.Month, k.Timestamp.Day,
            k.Timestamp.Hour, k.Timestamp.Minute - (k.Timestamp.Minute % 15), 0),
      (k, s) => new { _id = k, count = s.Count() }
    )
    .OrderBy(k => k._id);

已发送到服务器的声明:

Statement sent to the server:

[
  { "$match" : {
    "Timestamp" : {
      "$gte" : ISODate("2018-05-01T00:00:00Z"),
      "$lt" : ISODate("2018-06-01T00:00:00Z")
    }
  } },
  { "$group" : {
    "_id" : {
      "$dateFromParts" : {
        "year" : { "$year" : "$Timestamp" },
        "month" : { "$month" : "$Timestamp" },
        "day" : { "$dayOfMonth" : "$Timestamp" },
        "hour" : { "$hour" : "$Timestamp" },
        "minute" : { "$subtract" : [ 
          { "$minute" : "$Timestamp" }, 
          { "$mod" : [ { "$minute" : "$Timestamp" }, 15 ] } 
        ] },
        "second" : 0
      }
    },
    "count" : { "$sum" : 1 }
  } },
  { "$sort" : { "_id" : 1 } }
]

如您所见,它们基本上都是相同的表单

As you can see it's all basically the same form

如果您要复制原始的日期数学" ",那么它目前超出了您可以使用LINQ或Fluent构建器实际执行的范围.获得相同序列的唯一方法是使用BsonDocument构造:

If you are looking to replicate the original "date math" form as posted, then it currently falls beyond the scope of what you can actually do with either LINQ or the Fluent builders. The only way to get that same sequence is with BsonDocument construction:

DateTime epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc);

var group = new BsonDocument { {
  "$group",
  new BsonDocument {
    { "_id",
    new BsonDocument { {
      "$add", new BsonArray
      {
        new BsonDocument { {
            "$subtract",
            new BsonArray {
              new BsonDocument { { "$subtract", new BsonArray { "$Timestamp", epoch } } },
              new BsonDocument { {
                "$mod", new BsonArray
                {
                 new BsonDocument { { "$subtract", new BsonArray { "$Timestamp", epoch } } },
                 1000 * 60 * 15
               }
             } }
           }
         } },
         epoch
       }
     } }
     },
     {
       "count", new BsonDocument("$sum", 1)
     }
   }
} };

var query = sales.Aggregate()
  .Match(k => k.Timestamp >= startDate && k.Timestamp < endDate)
  .AppendStage<BsonDocument>(group)
  .Sort(new BsonDocument("_id", 1))
  .ToList();

请求已发送到服务器:

[
  { "$match" : {
    "Timestamp" : {
      "$gte" : ISODate("2018-05-01T00:00:00Z"),
      "$lt" : ISODate("2018-06-01T00:00:00Z")
    }
  } },
  { "$group" : {
    "_id" : { 
      "$add" : [
        { "$subtract" : [ 
          { "$subtract" : [ "$Timestamp", ISODate("1970-01-01T00:00:00Z") ] },
          { "$mod" : [ 
            { "$subtract" : [ "$Timestamp", ISODate("1970-01-01T00:00:00Z") ] },
            900000
          ] }
        ] },
        ISODate("1970-01-01T00:00:00Z")
      ]
    },
    "count" : { "$sum" : 1 }
  } },
  { "$sort" : { "_id" : 1 } }
]

我们现在无法执行此操作的主要原因是因为当前的语句序列化基本上不同意.NET Framework的说法,即减去两个DateTime值会返回一个TimeSpan,而MongoDB构造会减去两个BSON日期返回自纪元以来的毫秒数",这从本质上讲就是数学的工作原理.

The big reason we cannot do this right now is because the current serialize of the statements basically disagrees of the point that the .NET Framework says that subtracting two DateTime values return a TimeSpan, and the MongoDB construct of subtracting two BSON Dates returns the "milliseconds since epoch", which is essentially how the math works.

lamdba表达式的文字"翻译实质上是:

The "literal" translation of the lamdba expression is essentially:

p =>  epoch.AddMilliseconds(
       (p.Timestamp - epoch).TotalMilliseconds
       - ((p.Timestamp - epoch).TotalMilliseconds % 1000 * 60 * 15))

但是映射仍然需要一些工作,以便识别语句或正式确定为此目的实际上打算使用哪种语句.

But the mapping still needs some work in order to either recognize the statements or formalize on which sort of statements are actually intended for this purpose.

值得注意的是,MongoDB 4.0引入了 $convert 运算符和 $toLong

Notably MongoDB 4.0 introduces the $convert operator and the common aliases of $toLong and $toDate, which can all be used in the pipeline in place of the current handling on "addition" and "subtraction" with BSON Dates. These begin to form a more "formal" specification for such conversions rather than the method as shown which solely relied on that "addition" and "subtraction", which is still valid, but such named operators are much clearer of intent within the code:

{ "$group": {
  "_id": {
    "$toDate": {
      "$subtract": [
        { "$toLong": "$Timestamp" },
        { "$mod": [{ "$toLong": "$Timestamp" }, 1000 * 60 * 15 ] }
      ]
    }
  },
  "count": { "$sum": 1 }
}}

很显然,使用LINQ的形式化"运算符对此类"DateToLong"和"LongToDate"函数使用LINQ进行语句构造,则该语句变得更加清晰,而没有非工作状态"中显示的强制"类型. lambda表达式已完成.

It's pretty plain to see that with "formalized" operators for statement construction with LINQ for such "DateToLong" and "LongToDate" functions, then the statement becomes much cleaner without the types of "coercions" shown in the "non working" lambda expression being done.

这篇关于MongoDB .NET驱动程序按时间范围分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆