如何使用汇总计算运行总额 [英] How to calculate the running total using aggregate

查看:62
本文介绍了如何使用汇总计算运行总额的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个简单的财务应用程序,用于跟踪收入和结果.

I'm developing a simple financial app for keeping track of incomes and outcomes.

为简单起见,让我们假设这些是我的一些文档:

For the sake of simplicity, let's suppose these are some of my documents:

{ "_id" : ObjectId("54adc0659413535e02fba115"), "description" : "test1", "amount" : 100, "dateEntry" : ISODate("2015-01-07T23:00:00Z") }
{ "_id" : ObjectId("54adc21a0d150c760270f99c"), "description" : "test2", "amount" : 50, "dateEntry" : ISODate("2015-01-06T23:00:00Z") }
{ "_id" : ObjectId("54b05da766341e4802b785c0"), "description" : "test3", "amount" : 11, "dateEntry" : ISODate("2015-01-09T23:00:00Z") }
{ "_id" : ObjectId("54b05db066341e4802b785c1"), "description" : "test4", "amount" : 2, "dateEntry" : ISODate("2015-01-09T23:00:00Z") }
{ "_id" : ObjectId("54b05dbb66341e4802b785c2"), "description" : "test5", "amount" : 12, "dateEntry" : ISODate("2015-01-09T23:00:00Z") }
{ "_id" : ObjectId("54b05f4ee0933a5c02398d55"), "description" : "test6", "amount" : 4, "dateEntry" : ISODate("2015-01-09T23:00:00Z") }

我现在想基于这样的数据绘制一个余额"图表:

What I would like now is to draw a "balance" chart, based on such data:

[
   { day:'2015-01-06', amount:50}, 
   { day:'2015-01-07', amount:150}, 
   { day:'2015-01-09', amount:179},
...
]

换句话说,我需要按天对所有交易进行分组,并且每天需要总结我以前的所有交易(自世界之初起).

In other words, I need to group all my transactions by day, and for each day I need to sum all of my previous transactions ( since the beginning of the world ).

我已经知道如何按天分组:

I already know how to group by day:

$group: {
   _id: { 
      y: {$year:"$dateEntry"}, 
      m: {$month:"$dateEntry"}, 
      d: {$dayOfMonth:"$dateEntry"} 
   }, 
   sum: ???
}

但是我不知道如何回溯所有金额. 想象一下,我需要显示一个月度余额报告:我是否应该运行31个查询,每天进行一次查询,以汇总除次日以外的所有交易金额?当然可以,但是不要认为那是最好的解决方案.

But I don't know how to go back and sum all the amounts. Imagine I need to show a monthly balance report: should I run 31 queries, one for each day summing all transaction's amount except next days? Sure I can, but don't think that's the best solution.

提前谢谢!

推荐答案

实际上,至少比聚合框架更适合 mapReduce 在最初的问题解决中.聚合框架不具有先前文档的值或文档的先前分组"值的概念,因此这就是为什么它不能做到这一点.

Actually more suited to mapReduce than the aggregation framework, at least in the initial problem solving. The aggregation framework has no concept of the value of a previous document, or the previous "grouped" value of a document so this is why it cannot do this.

另一方面,mapReduce具有一个全局范围",可以在处理阶段和文档之间共享它们.这将为您提供所需日末当前余额的运行总额".

On the other hand, mapReduce has a "global scope" that can be shared between stages and documents as they are processed. This will get you the "running total" for the current balance at end of day you require.

db.collection.mapReduce(
  function () {
    var date = new Date(this.dateEntry.valueOf() -
      ( this.dateEntry.valueOf() % ( 1000 * 60 * 60 * 24 ) )
    );

    emit( date, this.amount );
  },
  function(key,values) {
      return Array.sum( values );
  },
  { 
      "scope": { "total": 0 },
      "finalize": function(key,value) {
          total += value;
          return total;
      },
      "out": { "inline": 1 }
  }
)      

这将按日期分组求和,然后在完成"部分中进行每天的累计.

That will sum by date grouping and then in the "finalize" section it makes a cumulative sum from each day.

   "results" : [
            {
                    "_id" : ISODate("2015-01-06T00:00:00Z"),
                    "value" : 50
            },
            {
                    "_id" : ISODate("2015-01-07T00:00:00Z"),
                    "value" : 150
            },
            {
                    "_id" : ISODate("2015-01-09T00:00:00Z"),
                    "value" : 179
            }
    ],

从长远来看,最好是每天单独收集一个条目,并使用 $inc 在每个开头的 upsert 创建一个新文档以结转前一天的余额:

In the longer term you would be best of having a separate collection with an entry for each day an alter the balance using $inc in an update. Just also do an $inc upsert at the beginning of each day to create a new document carrying forward the balance from the previous day:

// increase balance
db.daily(
    { "dateEntry": currentDate },
    { "$inc": { "balance": amount } },
    { "upsert": true }
);

// decrease balance
db.daily(
    { "dateEntry": currentDate },
    { "$inc": { "balance": -amount } },
    { "upsert": true }
);

// Each day
var lastDay = db.daily.findOne({ "dateEntry": lastDate });
db.daily(
    { "dateEntry": currentDate },
    { "$inc": { "balance": lastDay.balance } },
    { "upsert": true }
);


如何不这样做

尽管的确如此,但是由于最初的写作有更多的运算符引入了聚合框架,所以这里所要求的仍然不是在聚合语句中实际可行的.


How NOT to do this

Whilst it is true that since the original writing there are more operators introduced to the aggregation framework, what is being asked here is still not practical to do in an aggregation statement.

同样的基本规则适用,聚合框架 不能引用先前的文档"中的值,也不能存储全局变量".通过将所有结果强制转换为数组来黑客" :

The same basic rule applies that the aggregation framework cannot reference a value from a previous "document", nor can it store a "global variable". "Hacking" this by coercion of all results into an array:

db.collection.aggregate([
  { "$group": {
    "_id": { 
      "y": { "$year": "$dateEntry" }, 
      "m": { "$month": "$dateEntry" }, 
      "d": { "$dayOfMonth": "$dateEntry" } 
    }, 
    "amount": { "$sum": "$amount" }
  }},
  { "$sort": { "_id": 1 } },
  { "$group": {
    "_id": null,
    "docs": { "$push": "$$ROOT" }
  }},
  { "$addFields": {
    "docs": {
      "$map": {
        "input": { "$range": [ 0, { "$size": "$docs" } ] },
        "in": {
          "$mergeObjects": [
            { "$arrayElemAt": [ "$docs", "$$this" ] },
            { "amount": { 
              "$sum": { 
                "$slice": [ "$docs.amount", 0, { "$add": [ "$$this", 1 ] } ]
              }
            }}
          ]
        }
      }
    }
  }},
  { "$unwind": "$docs" },
  { "$replaceRoot": { "newRoot": "$docs" } }
])

考虑到较大的结果集确实具有突破16MB BSON限制的真实可能性,因此这既不是高效的解决方案,也不是安全" .作为黄金法则" ,任何建议将所有内容放入单个文档的数组中的东西:

That is neither a performant solution or "safe" considering that larger result sets run the very real probability of breaching the 16MB BSON limit. As a "golden rule", anything that proposes to put ALL content within the array of a single document:

{ "$group": {
  "_id": null,
  "docs": { "$push": "$$ROOT" }
}}

那是一个基本缺陷,因此不是解决方案.

then that is a basic flaw and therefore not a solution.

处理此问题的更具说服力的方法通常是在结果的运行游标上进行后期处理:

The far more conclusive ways to handle this typically would be post processing on the running cursor of results:

var globalAmount = 0;

db.collection.aggregate([
  { $group: {
    "_id": { 
      y: { $year:"$dateEntry"}, 
      m: { $month:"$dateEntry"}, 
      d: { $dayOfMonth:"$dateEntry"} 
    }, 
    amount: { "$sum": "$amount" }
  }},
  { "$sort": { "_id": 1 } }
]).map(doc => {
  globalAmount += doc.amount;
  return Object.assign(doc, { amount: globalAmount });
})

所以总的来说,这样做总是更好的:

So in general it's always better to:

  • 使用光标迭代和总计的跟踪变量. mapReduce示例是上述简化过程的人为示例.

  • Use cursor iteration and a tracking variable for totals. The mapReduce sample is a contrived example of the simplified process above.

使用预先汇总的总计.可能与光标迭代有关,这取决于您的预聚合过程,而不论是间隔总计还是结转"运行总计.

Use pre-aggregated totals. Possibly in concert with cursor iteration depending on your pre-aggregation process, whether that is just interval total or a "carried forward" running total.

聚合框架应真正用于聚合",仅此而已.通过诸如处理数组之类的过程来强制数据强制处理既不明智也不安全,而且最重要的是,客户端处理代码更简洁,更有效.

The aggregation framework should really be used for "aggregating" and nothing more. Forcing coercions on data via processes like manipulating into an array just to process how you want is neither wise or safe, and most importantly the client manipulation code is far cleaner and more efficient.

让数据库做他们擅长的事情,因为您的操纵"要好得多地用代码来处理.

Let databases do the things they are good at, as you "manipulations" are far better handled in code instead.

这篇关于如何使用汇总计算运行总额的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆