如何使用聚合计算运行总计? [英] How to calculate the running total using aggregate?

查看:14
本文介绍了如何使用聚合计算运行总计?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个简单的财务应用程序来跟踪收入和结果.

I'm developing a simple financial app for keeping track of incomes and outcomes.

为了简单起见,让我们假设这些是我的一些文档:

For the sake of simplicity, let's suppose these are some of my documents:

{ description: "test1", amount: 100, dateEntry: ISODate("2015-01-07T23:00:00Z") }
{ description: "test2", amount: 50,  dateEntry: ISODate("2015-01-06T23:00:00Z") }
{ description: "test3", amount: 11,  dateEntry: ISODate("2015-01-09T23:00:00Z") }
{ description: "test4", amount: 2,   dateEntry: ISODate("2015-01-09T23:00:00Z") }
{ description: "test5", amount: 12,  dateEntry: ISODate("2015-01-09T23:00:00Z") }
{ description: "test6", amount: 4,   dateEntry: ISODate("2015-01-09T23:00:00Z") }

我现在想要的是画一个天平";图表,基于这些数据:

What I would like now is to draw a "balance" chart, based on such data:

{ day: "2015-01-06", amount: 50  }
{ day: "2015-01-07", amount: 150 }
{ day: "2015-01-09", amount: 179 }

换句话说,我需要按天对我的所有交易进行分组,并且对于每一天,我需要总结我之前的所有交易(自世界开始以来).

In other words, I need to group all my transactions by day, and for each day I need to sum all of my previous transactions (since the beginning of the world).

我已经知道如何按天分组:

I already know how to group by day:

$group: {
   _id: { 
      y: {$year:"$dateEntry"}, 
      m: {$month:"$dateEntry"}, 
      d: {$dayOfMonth:"$dateEntry"} 
   }, 
   sum: ???
}

但我不知道如何返回并汇总所有金额.

But I don't know how to go back and sum all the amounts.

想象一下,我需要显示月度余额报告:我应该运行 31 个查询,每天一个查询汇总所有交易的金额,除了接下来几天?我当然可以,但不要认为那是最好的解决方案.

Imagine I need to show a monthly balance report: should I run 31 queries, one for each day summing all transaction's amount except next days? Sure I can, but don't think that's the best solution.

推荐答案

实际上比聚合框架更适合mapReduce,至少在初步解决问题.聚合框架没有先前文档的值的概念,或者文档的先前分组"值,所以这就是它不能这样做的原因.

Actually more suited to mapReduce than the aggregation framework, at least in the initial problem solving. The aggregation framework has no concept of the value of a previous document, or the previous "grouped" value of a document so this is why it cannot do this.

另一方面,mapReduce 有一个全局范围",可以在处理阶段和文档时在它们之间共享.这将为您提供所需的一天结束时当前余额的运行总额".

On the other hand, mapReduce has a "global scope" that can be shared between stages and documents as they are processed. This will get you the "running total" for the current balance at end of day you require.

db.collection.mapReduce(
  function () {
    var date = new Date(this.dateEntry.valueOf() -
      ( this.dateEntry.valueOf() % ( 1000 * 60 * 60 * 24 ) )
    );

    emit( date, this.amount );
  },
  function(key,values) {
      return Array.sum( values );
  },
  { 
      "scope": { "total": 0 },
      "finalize": function(key,value) {
          total += value;
          return total;
      },
      "out": { "inline": 1 }
  }
)      

这将按日期分组求和,然后在最终确定"部分中,它会计算每天的累计总和.

That will sum by date grouping and then in the "finalize" section it makes a cumulative sum from each day.

   "results" : [
            {
                    "_id" : ISODate("2015-01-06T00:00:00Z"),
                    "value" : 50
            },
            {
                    "_id" : ISODate("2015-01-07T00:00:00Z"),
                    "value" : 150
            },
            {
                    "_id" : ISODate("2015-01-09T00:00:00Z"),
                    "value" : 179
            }
    ],

从长远来看,您最好有一个单独的集合,每天都有一个条目,并使用 $inc 在更新中.也做一个 $inc upsert 在每天开始创建一个新文档,结转前一天的余额:

In the longer term you would be best of having a separate collection with an entry for each day an alter the balance using $inc in an update. Just also do an $inc upsert at the beginning of each day to create a new document carrying forward the balance from the previous day:

// increase balance
db.daily(
    { "dateEntry": currentDate },
    { "$inc": { "balance": amount } },
    { "upsert": true }
);

// decrease balance
db.daily(
    { "dateEntry": currentDate },
    { "$inc": { "balance": -amount } },
    { "upsert": true }
);

// Each day
var lastDay = db.daily.findOne({ "dateEntry": lastDate });
db.daily(
    { "dateEntry": currentDate },
    { "$inc": { "balance": lastDay.balance } },
    { "upsert": true }
);

<小时>

如何不这样做

虽然在最初的写作中确实有更多的运算符被引入到聚合框架中,但这里所要求的内容在聚合语句中仍然不切实际.

同样的基本规则适用于聚合框架不能引用来自先前文档"的值,也不能存储全局变量".黑客"这是通过将所有结果强制转换为数组:

The same basic rule applies that the aggregation framework cannot reference a value from a previous "document", nor can it store a "global variable". "Hacking" this by coercion of all results into an array:

db.collection.aggregate([
  { "$group": {
    "_id": { 
      "y": { "$year": "$dateEntry" }, 
      "m": { "$month": "$dateEntry" }, 
      "d": { "$dayOfMonth": "$dateEntry" } 
    }, 
    "amount": { "$sum": "$amount" }
  }},
  { "$sort": { "_id": 1 } },
  { "$group": {
    "_id": null,
    "docs": { "$push": "$$ROOT" }
  }},
  { "$addFields": {
    "docs": {
      "$map": {
        "input": { "$range": [ 0, { "$size": "$docs" } ] },
        "in": {
          "$mergeObjects": [
            { "$arrayElemAt": [ "$docs", "$$this" ] },
            { "amount": { 
              "$sum": { 
                "$slice": [ "$docs.amount", 0, { "$add": [ "$$this", 1 ] } ]
              }
            }}
          ]
        }
      }
    }
  }},
  { "$unwind": "$docs" },
  { "$replaceRoot": { "newRoot": "$docs" } }
])

考虑到较大的结果集很可能违反 16MB BSON 限制,这既不是高性能解决方案,也不是安全".作为黄金法则",任何建议将所有内容放在单个文档数组中的内容:

That is neither a performant solution or "safe" considering that larger result sets run the very real probability of breaching the 16MB BSON limit. As a "golden rule", anything that proposes to put ALL content within the array of a single document:

{ "$group": {
  "_id": null,
  "docs": { "$push": "$$ROOT" }
}}

那么这是一个基本缺陷,因此不是解决方案.

then that is a basic flaw and therefore not a solution.

处理这个问题的更有说服力的方法通常是对正在运行的结果游标进行后处理:

The far more conclusive ways to handle this typically would be post processing on the running cursor of results:

var globalAmount = 0;

db.collection.aggregate([
  { $group: {
    "_id": { 
      y: { $year:"$dateEntry"}, 
      m: { $month:"$dateEntry"}, 
      d: { $dayOfMonth:"$dateEntry"} 
    }, 
    amount: { "$sum": "$amount" }
  }},
  { "$sort": { "_id": 1 } }
]).map(doc => {
  globalAmount += doc.amount;
  return Object.assign(doc, { amount: globalAmount });
})

所以总的来说,最好:

  • 对总数使用游标迭代和跟踪变量.mapReduce 示例是上述简化过程的人为示例.

  • Use cursor iteration and a tracking variable for totals. The mapReduce sample is a contrived example of the simplified process above.

使用预先汇总的总数.可能与游标迭代一致,具体取决于您的预聚合过程,无论是间隔总计还是结转"运行总计.

Use pre-aggregated totals. Possibly in concert with cursor iteration depending on your pre-aggregation process, whether that is just interval total or a "carried forward" running total.

聚合框架应该真正用于聚合",仅此而已.通过诸如操作数组之类的过程来强制对数据进行强制转换以处理您想要的方式既不明智也不安全,最重要的是,客户端操作代码更清晰、更高效.

The aggregation framework should really be used for "aggregating" and nothing more. Forcing coercions on data via processes like manipulating into an array just to process how you want is neither wise or safe, and most importantly the client manipulation code is far cleaner and more efficient.

让数据库做它们擅长的事情,因为在代码中处理操作"要好得多.

Let databases do the things they are good at, as you "manipulations" are far better handled in code instead.

这篇关于如何使用聚合计算运行总计?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆