如何使用聚合计算运行总数? [英] How to calculate the running total using aggregate?

查看:9
本文介绍了如何使用聚合计算运行总数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个简单的财务应用程序来跟踪收入和结果.

I'm developing a simple financial app for keeping track of incomes and outcomes.

为了简单起见,假设这些是我的一些文档:

For the sake of simplicity, let's suppose these are some of my documents:

{ description: "test1", amount: 100, dateEntry: ISODate("2015-01-07T23:00:00Z") }
{ description: "test2", amount: 50,  dateEntry: ISODate("2015-01-06T23:00:00Z") }
{ description: "test3", amount: 11,  dateEntry: ISODate("2015-01-09T23:00:00Z") }
{ description: "test4", amount: 2,   dateEntry: ISODate("2015-01-09T23:00:00Z") }
{ description: "test5", amount: 12,  dateEntry: ISODate("2015-01-09T23:00:00Z") }
{ description: "test6", amount: 4,   dateEntry: ISODate("2015-01-09T23:00:00Z") }

我现在想要的是画一个平衡";图表,基于此类数据:

What I would like now is to draw a "balance" chart, based on such data:

{ day: "2015-01-06", amount: 50  }
{ day: "2015-01-07", amount: 150 }
{ day: "2015-01-09", amount: 179 }

换句话说,我需要按天对我的所有交易进行分组,并且每天我需要汇总我之前的所有交易(从世界之初开始).

In other words, I need to group all my transactions by day, and for each day I need to sum all of my previous transactions (since the beginning of the world).

我已经知道如何按天分组了:

I already know how to group by day:

$group: {
   _id: { 
      y: {$year:"$dateEntry"}, 
      m: {$month:"$dateEntry"}, 
      d: {$dayOfMonth:"$dateEntry"} 
   }, 
   sum: ???
}

但我不知道如何返回并汇总所有金额.

But I don't know how to go back and sum all the amounts.

假设我需要显示每月余额报告:我是否应该运行 31 次查询,每天一次,将除后几天之外的所有交易金额相加?当然可以,但不要认为这是最好的解决方案.

Imagine I need to show a monthly balance report: should I run 31 queries, one for each day summing all transaction's amount except next days? Sure I can, but don't think that's the best solution.

推荐答案

其实比聚合框架更适合mapReduce,至少在最初的问题解决.聚合框架没有先前文档的值的概念,或者文档的先前分组"值的概念,所以这就是它不能这样做的原因.

Actually more suited to mapReduce than the aggregation framework, at least in the initial problem solving. The aggregation framework has no concept of the value of a previous document, or the previous "grouped" value of a document so this is why it cannot do this.

另一方面,mapReduce 有一个全局范围",可以在处理阶段和文档时在它们之间共享.这将在您需要的一天结束时为您提供当前余额的运行总计".

On the other hand, mapReduce has a "global scope" that can be shared between stages and documents as they are processed. This will get you the "running total" for the current balance at end of day you require.

db.collection.mapReduce(
  function () {
    var date = new Date(this.dateEntry.valueOf() -
      ( this.dateEntry.valueOf() % ( 1000 * 60 * 60 * 24 ) )
    );

    emit( date, this.amount );
  },
  function(key,values) {
      return Array.sum( values );
  },
  { 
      "scope": { "total": 0 },
      "finalize": function(key,value) {
          total += value;
          return total;
      },
      "out": { "inline": 1 }
  }
)      

这将按日期分组求和,然后在最终确定"部分中计算每天的累积总和.

That will sum by date grouping and then in the "finalize" section it makes a cumulative sum from each day.

   "results" : [
            {
                    "_id" : ISODate("2015-01-06T00:00:00Z"),
                    "value" : 50
            },
            {
                    "_id" : ISODate("2015-01-07T00:00:00Z"),
                    "value" : 150
            },
            {
                    "_id" : ISODate("2015-01-09T00:00:00Z"),
                    "value" : 179
            }
    ],

从长远来看,您最好有一个单独的集合,每天都有一个条目,并使用 $inc 更新.只需执行 $inc upsert 在每天的开始创建一个新的文档结转前一天的余额:

In the longer term you would be best of having a separate collection with an entry for each day an alter the balance using $inc in an update. Just also do an $inc upsert at the beginning of each day to create a new document carrying forward the balance from the previous day:

// increase balance
db.daily(
    { "dateEntry": currentDate },
    { "$inc": { "balance": amount } },
    { "upsert": true }
);

// decrease balance
db.daily(
    { "dateEntry": currentDate },
    { "$inc": { "balance": -amount } },
    { "upsert": true }
);

// Each day
var lastDay = db.daily.findOne({ "dateEntry": lastDate });
db.daily(
    { "dateEntry": currentDate },
    { "$inc": { "balance": lastDay.balance } },
    { "upsert": true }
);

<小时>

如何不这样做

虽然确实由于最初的写作有更多的运算符被引入聚合框架,但这里提出的问题仍然不是实际在聚合语句中做的.

同样的基本规则适用于聚合框架不能从以前的文档"引用值,也不能存储全局变量".黑客" 通过将所有结果强制转换为数组:

The same basic rule applies that the aggregation framework cannot reference a value from a previous "document", nor can it store a "global variable". "Hacking" this by coercion of all results into an array:

db.collection.aggregate([
  { "$group": {
    "_id": { 
      "y": { "$year": "$dateEntry" }, 
      "m": { "$month": "$dateEntry" }, 
      "d": { "$dayOfMonth": "$dateEntry" } 
    }, 
    "amount": { "$sum": "$amount" }
  }},
  { "$sort": { "_id": 1 } },
  { "$group": {
    "_id": null,
    "docs": { "$push": "$$ROOT" }
  }},
  { "$addFields": {
    "docs": {
      "$map": {
        "input": { "$range": [ 0, { "$size": "$docs" } ] },
        "in": {
          "$mergeObjects": [
            { "$arrayElemAt": [ "$docs", "$$this" ] },
            { "amount": { 
              "$sum": { 
                "$slice": [ "$docs.amount", 0, { "$add": [ "$$this", 1 ] } ]
              }
            }}
          ]
        }
      }
    }
  }},
  { "$unwind": "$docs" },
  { "$replaceRoot": { "newRoot": "$docs" } }
])

这既不是一个高性能的解决方案,也不是安全",考虑到更大的结果集运行违反 16MB BSON 限制的非常真实的可能性.作为黄金法则",任何建议将所有内容放在单个文档的数组中的东西:

That is neither a performant solution or "safe" considering that larger result sets run the very real probability of breaching the 16MB BSON limit. As a "golden rule", anything that proposes to put ALL content within the array of a single document:

{ "$group": {
  "_id": null,
  "docs": { "$push": "$$ROOT" }
}}

那么这是一个基本缺陷,因此不是解决方案.

then that is a basic flaw and therefore not a solution.

解决这个问题的更有说服力的方法通常是对结果的运行光标进行后处理:

The far more conclusive ways to handle this typically would be post processing on the running cursor of results:

var globalAmount = 0;

db.collection.aggregate([
  { $group: {
    "_id": { 
      y: { $year:"$dateEntry"}, 
      m: { $month:"$dateEntry"}, 
      d: { $dayOfMonth:"$dateEntry"} 
    }, 
    amount: { "$sum": "$amount" }
  }},
  { "$sort": { "_id": 1 } }
]).map(doc => {
  globalAmount += doc.amount;
  return Object.assign(doc, { amount: globalAmount });
})

所以总的来说,最好:

  • 使用游标迭代和跟踪变量进行总计.mapReduce 示例是上述简化过程的人为示例.

  • Use cursor iteration and a tracking variable for totals. The mapReduce sample is a contrived example of the simplified process above.

使用预先汇总的总数.可能与游标迭代一致,具体取决于您的预聚合过程,无论是间隔总计还是结转"运行总计.

Use pre-aggregated totals. Possibly in concert with cursor iteration depending on your pre-aggregation process, whether that is just interval total or a "carried forward" running total.

聚合框架应该真正用于聚合",仅此而已.通过诸如操作到数组之类的过程来强制对数据进行强制转换只是为了处理您想要的方式既不明智也不安全,最重要的是客户端操作代码更清洁、更高效.

The aggregation framework should really be used for "aggregating" and nothing more. Forcing coercions on data via processes like manipulating into an array just to process how you want is neither wise or safe, and most importantly the client manipulation code is far cleaner and more efficient.

让数据库做它们擅长的事情,因为您的操作"在代码中处理得更好.

Let databases do the things they are good at, as you "manipulations" are far better handled in code instead.

这篇关于如何使用聚合计算运行总数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆