mongoDB聚合:基于数组名称的总和 [英] mongoDB Aggregation: sum based on array names

查看:56
本文介绍了mongoDB聚合:基于数组名称的总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下匹配数据:

{
  date: 20140101,
  duration: 23232,
  win:[
  {
    player: "Player1",
    score : 2344324
  },
  {
    player: "Player4",
    score : 23132
  }
],
  loss:[
  {
    player: "Player2",
    score : 324
  },
  {
    player: "Player3",
    score : 232
  }
]
}

现在,我想计算所有像这样的球员的得失:

Now i want to count the wins and losses for all players like this:

result :
[
  {
    player : "Player1",
    wins : 12,
    losses : 2
  },
  {
    player : "Player2",
    wins : 7,
    losses : 8
  }
]

我的问题是赢/输信息仅存在于阵列名称中.

My problem is that the win/loss information only exists in the name of the array.

推荐答案

其中有很多内容,特别是如果您相对不熟悉

There is a lot in this, especially if you are relatively new to using aggregate, but it can be done. I'll explain the stages after the listing:

db.collection.aggregate([

    // 1. Unwind both arrays
    {"$unwind": "$win"},
    {"$unwind": "$loss"},

    // 2. Cast each field with a type and the array on the end
    {"$project":{ 
        "win.player": "$win.player",
        "win.type": {"$cond":[1,"win",0]},
        "loss.player": "$loss.player", 
        "loss.type": {"$cond": [1,"loss",0]}, 
        "score": {"$cond":[1,["win", "loss"],0]} 
    }},

    // Unwind the "score" array
    {"$unwind": "$score"},

    // 3. Reshape to "result" based on the value of "score"
    {"$project": { 
        "result.player": {"$cond": [
            {"$eq": ["$win.type","$score"]},
            "$win.player", 
            "$loss.player"
        ] },
        "result.type": {"$cond": [
            {"$eq":["$win.type", "$score"]},
            "$win.type",
            "$loss.type"
        ]}
    }},

    // 4. Get all unique result within each document 
    {"$group": { "_id": { "_id":"$_id", "result": "$result" } }},

    // 5. Sum wins and losses across documents
    {"$group": { 
        "_id": "$_id.result.player", 
        "wins": {"$sum": {"$cond": [
            {"$eq":["$_id.result.type","win"]},1,0
        ]}}, 
        "losses": {"$sum":{"$cond": [
            {"$eq":["$_id.result.type","loss"]},1,0
        ]}}
    }}
])


摘要


这确实假设每个获胜"和亏损"数组中的玩家"都是唯一的.对于似乎在此处建模的东西,这似乎是合理的:


Summary


This does take the assumption that the "players" in each "win" and "loss" array are all unique to start with. That seemed reasonable for what appeared to be modeled here:

  1. 展开两个数组.这样会创建重复项,但是稍后将删除它们.

  1. Unwind both of the arrays. This creates duplicates but they will be removed later.

投影时会使用 $ cond 运算符(三元)以获取一些文字字符串值.最后一种用法很特殊,因为正在添加和数组.因此,在投影之后,该阵列将再次解开.更多重复,但这就是重点.一个胜利",一个失败"记录.

When projecting there is some usage of the $cond operator (a ternary) in order to get some literal string values. And the last usage is special, because and array is being added. So after projecting that array is going to be unwound again. More duplicates, but that's the point. One "win", one "loss" record for each.

使用 $ cond 运算符和也使用 $ eq 运算符.这次,我们将两个字段合并.因此,使用此方法,当字段的类型"与得分"中的值匹配时,则将关键字段"用作结果"字段值.结果是两个不同的获胜"和损失"字段现在共享相同的名称,由类型"标识.

More projection with the $cond operator and the use of the $eq operator as well. This time we are merging the two fields into one. So using this, when the "type" of the field matches the value in "score" then that "key field" is used for the "result" field value. Outcome is the two different "win" and "loss" fields now share the same name, identified by "type".

摆脱每个文档中的重复项.只需按文档_id和结果"字段作为键进行分组.现在应该有与原始文档相同的获胜"和亏损"记录,只是从数组中删除它们的形式有所不同.

Getting rid of the duplicates within each document. Simply grouping by the document _id and the "result" fields as keys. Now there should be the same "win" and "loss" records as there was in the original document, just in a different form as they are removed from the arrays.

最后将所有文档分组,以获取每个玩家"的总数. $ cond $ sum 以获得胜利"和损失"的总数.

Finally group across all documents to get the totals per "player". More usage of $cond and $eq but this time to determine whether the current document is a "win" or a "loss". So where this matches we return 1 and where false we return 0. Those values are passed to $sum in order to get the total counts for "wins" and "losses".

这说明了如何获得结果.

And that explains how to get to the result.

从文档中进一步了解聚合运算符.该清单中 $ cond 的一些有趣"用法应可以替换为 $ literal 运算符.但这要等到2.6版及更高版本发布.

Learn more on the aggregation operators from the documentation. Some of the "funny" usages for $cond in that listing should be able to be replaced with a $literal operator. But that will not be available until version 2.6 and upwards is release.

当然,在什么是新的集合运算符中即将在撰写本文时发布,这将有助于在某种程度上简化此操作:

Of course there a new set operators in what is the upcoming release at the time of writing, which will help to simplify this somewhat:

db.collection.aggregate([
    { "$unwind": "$win" },
    { "$project": {
        "win.player": "$win.player",
        "win.type": { "$literal": "win" },
        "loss": 1,
    }},
    { "$group": {
        "_id" : {
            "_id": "$_id",
            "loss": "$loss"
        },
        "win": { "$push": "$win" }
    }},
    { "$unwind": "$_id.loss" },
    { "$project": {
        "loss.player": "$_id.loss.player",
        "loss.type": { "$literal": "loss" },
        "win": 1,
    }},
    { "$group": {
        "_id" : {
            "_id": "$_id._id",
            "win": "$win"
        },
        "loss": { "$push": "$loss" }
    }},
    { "$project": {
        "_id": "$_id._id",
        "results": { "$setUnion": [ "$_id.win", "$loss" ] }
    }},
    { "$unwind": "$results" },
    { "$group": { 
        "_id": "$results.player", 
        "wins": {"$sum": {"$cond": [
            {"$eq":["$results.type","win"]},1,0
        ]}}, 
        "losses": {"$sum":{"$cond": [
            {"$eq":["$results.type","loss"]},1,0
        ]}}
    }}

])

但是简化"值得商bat.对我来说,就像摸索"并做更多的工作一样.它当然更传统,因为它仅依赖于 $ setUnion 合并阵列结果.

But "simplified" is debatable. To me, that just "feels" like it's "chugging around" and doing more work. It certainly is more traditional, as it simply relies on $setUnion to merge the array results.

但是通过稍微更改架构即可取消工作",如下所示:

But that "work" would be nullified by changing your schema a little, as shown here:

{
    "_id" : ObjectId("531ea2b1fcc997d5cc5cbbc9"),
    "win": [
        {
            "player" : "Player2",
            "type" : "win"
        },
        {
            "player" : "Player4",
            "type" : "win"
        }
    ],
    "loss" : [
        {
            "player" : "Player6",
            "type" : "loss"
        },
        {
            "player" : "Player5",
            "type" : "loss"
        },
    ]
}

这将消除我们通过添加"type"属性来投影数组内容的需要,并减少了查询和完成的工作:

And this removes the need to project the array contents by adding the "type" attribute as we have been doing, and reduces the query, and the work done:

db.collection.aggregate([
    { "$project": {
        "results": { "$setUnion": [ "$win", "$loss" ] }
    }},
    { "$unwind": "$results" },
    { "$group": { 
        "_id": "$results.player", 
        "wins": {"$sum": {"$cond": [
            {"$eq":["$results.type","win"]},1,0
        ]}}, 
        "losses": {"$sum":{"$cond": [
            {"$eq":["$results.type","loss"]},1,0
        ]}}
    }}

])

当然,只需按以下方式更改架构即可:

And of course just changing your schema as follows:

{
    "_id" : ObjectId("531ea2b1fcc997d5cc5cbbc9"),
    "results" : [
        {
            "player" : "Player6",
            "type" : "loss"
        },
        {
            "player" : "Player5",
            "type" : "loss"
        },
        {
            "player" : "Player2",
            "type" : "win"
        },
        {
            "player" : "Player4",
            "type" : "win"
        }
    ]
}

这使事情非常容易.这可以在2.6之前的版本中完成.因此,您可以立即执行以下操作:

That makes things very easy. And this could be done in versions prior to 2.6. So you could do it right now:

db.collection.aggregate([
    { "$unwind": "$results" },
    { "$group": { 
        "_id": "$results.player", 
        "wins": {"$sum": {"$cond": [
            {"$eq":["$results.type","win"]},1,0
        ]}}, 
        "losses": {"$sum":{"$cond": [
            {"$eq":["$results.type","loss"]},1,0
        ]}}
    }}

])

所以对我来说,如果它是我的应用程序,我希望使用上面显示的最后一种形式的模式,而不是您所拥有的模式.在提供的聚合操作中所做的所有工作(最后一条语句除外)都旨在采用现有的架构形式并将其操纵为 this 形式,因此可以轻松地运行简单的聚合如上所示.

So for me, if it was my application, I would want the schema in the last form shown above rather than what you have. All of the work done in the supplied aggregation operations (with exception of the last statement) is aimed at taking the existing schema form and manipulating it into this form, so then it is easy to run the simple aggregation statement as shown above.

由于每个玩家都用胜利/失败"属性标记",因此您始终可以以任何方式离散地访问胜利者/失败者".

As each player is "tagged" with the "win/loss" attribute, you can always just discretely access your "winners/loosers" anyhow.

最后一件事.您的日期是一个字符串.我不喜欢.

As a final thing. Your date is a string. I don't like that.

可能有这样做的理由,但我没有看到.如果您需要按分组,只需使用适当的BSON日期即可轻松进行汇总.然后,您还可以轻松地按其他时间间隔进行工作.

There may have been a reason for doing so but I don't see it. If you need to group by day that is easy to do in aggregation just by using a proper BSON date. You will also then be able to easily work with other time intervals.

因此,如果您确定了日期,并将其设置为开始日期,并用 end_time 替换了"duration",那么您将保留一些可以得到持续时间",只需将其作为日期值即可.

So if you fixed the date, and made it the start_date, and replaced "duration" with end_time, then you get to keep something that you can get the "duration" from by simple math + You get lots of extra benefits by having these as a date value instead.

这样可以为您的模式提供一些思考的机会.

So that may give you some food for thought on your schema.

对于那些感兴趣的人,下面是一些我用来生成工作数据集的代码:

For those who are interested, here is some code I used to generate a working set of data:

// Ye-olde array shuffle
function shuffle(array) {
    var m = array.length, t, i;

    while (m) {

        i = Math.floor(Math.random() * m--);

        t = array[m];
        array[m] = array[i];
        array[i] = t;

    }

    return array;
}


for ( var l=0; l<10000; l++ ) {

    var players = ["Player1","Player2","Player3","Player4"];

    var playlist = shuffle(players);
    for ( var x=0; x<playlist.length; x++ ) { 
        var obj = {  
            player: playlist[x], 
            score: Math.floor(Math.random() * (100000 - 50 + 1)) +50
        }; 

        playlist[x] = obj;
    }

    var rec = { 
        duration: Math.floor(Math.random() * (50000 - 15000 +1)) +15000,
        date: new Date(),
         win: playlist.slice(0,2),
        loss: playlist.slice(2) 
    };  

    db.game.insert(rec);
}

这篇关于mongoDB聚合:基于数组名称的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆