mongoDB聚合:基于数组名求和 [英] mongoDB Aggregation: sum based on array names

查看：16 发布时间：2022/1/20 13:51:10 mongodb aggregation-framework

本文介绍了mongoDB聚合:基于数组名求和的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下比赛数据:

{
  date: 20140101,
  duration: 23232,
  win:[
  {
    player: "Player1",
    score : 2344324
  },
  {
    player: "Player4",
    score : 23132
  }
],
  loss:[
  {
    player: "Player2",
    score : 324
  },
  {
    player: "Player3",
    score : 232
  }
]
}

现在我想像这样计算所有玩家的输赢:

Now i want to count the wins and losses for all players like this:

result :
[
  {
    player : "Player1",
    wins : 12,
    losses : 2
  },
  {
    player : "Player2",
    wins : 7,
    losses : 8
  }
]

我的问题是输赢信息只存在于数组名中.

My problem is that the win/loss information only exists in the name of the array.

总结

<小时>

这确实假设每个赢"和输"数组中的玩家"一开始都是唯一的.对于这里的模型来说，这似乎是合理的:

Summary

This does take the assumption that the "players" in each "win" and "loss" array are all unique to start with. That seemed reasonable for what appeared to be modeled here:

展开两个数组.这会创建重复项，但稍后将被删除.

Unwind both of the arrays. This creates duplicates but they will be removed later.

投影时有一些使用 $cond 运算符(三元组)以获取一些文字字符串值.最后一个用法很特殊，因为添加了数组.因此，在投影该数组之后将再次展开.更多重复，但这就是重点.一个赢"，一个输"记录.

When projecting there is some usage of the $cond operator (a ternary) in order to get some literal string values. And the last usage is special, because and array is being added. So after projecting that array is going to be unwound again. More duplicates, but that's the point. One "win", one "loss" record for each.

使用 $cond 运算符和$eq 运算符的使用.这次我们将这两个字段合并为一个.因此，使用此方法，当字段的类型"与分数"中的值匹配时，该关键字段"将用于结果"字段值.结果是两个不同的赢"和输"字段现在共享相同的名称，由类型"标识.

More projection with the $cond operator and the use of the $eq operator as well. This time we are merging the two fields into one. So using this, when the "type" of the field matches the value in "score" then that "key field" is used for the "result" field value. Outcome is the two different "win" and "loss" fields now share the same name, identified by "type".

删除每个文档中的重复项.只需按文档 _id 和结果"字段作为键进行分组.现在应该有与原始文档中相同的赢"和输"记录，只是从数组中删除时的形式不同.

Getting rid of the duplicates within each document. Simply grouping by the document _id and the "result" fields as keys. Now there should be the same "win" and "loss" records as there was in the original document, just in a different form as they are removed from the arrays.

最后对所有文档进行分组以获得每个玩家"的总数.$cond 和 $eq 但这次要确定当前文档是赢"还是输".因此，在匹配的地方我们返回 1，在错误的地方我们返回 0.这些值被传递给 $sum 以获得胜利"和失败"的总计数.

Finally group across all documents to get the totals per "player". More usage of $cond and $eq but this time to determine whether the current document is a "win" or a "loss". So where this matches we return 1 and where false we return 0. Those values are passed to $sum in order to get the total counts for "wins" and "losses".

这就解释了如何得到结果.

And that explains how to get to the result.

从文档中了解有关聚合运算符的更多信息.该列表中 $cond 的一些有趣"用法应该可以用 $literal 运算符替换.但这要等到 2.6 及更高版本发布.

Learn more on the aggregation operators from the documentation. Some of the "funny" usages for $cond in that listing should be able to be replaced with a $literal operator. But that will not be available until version 2.6 and upwards is release.

当然有一个新的set operator 在什么是在撰写本文时即将发布的版本，这将有助于简化这一点:

Of course there a new set operators in what is the upcoming release at the time of writing, which will help to simplify this somewhat:

db.collection.aggregate([
    { "$unwind": "$win" },
    { "$project": {
        "win.player": "$win.player",
        "win.type": { "$literal": "win" },
        "loss": 1,
    }},
    { "$group": {
        "_id" : {
            "_id": "$_id",
            "loss": "$loss"
        },
        "win": { "$push": "$win" }
    }},
    { "$unwind": "$_id.loss" },
    { "$project": {
        "loss.player": "$_id.loss.player",
        "loss.type": { "$literal": "loss" },
        "win": 1,
    }},
    { "$group": {
        "_id" : {
            "_id": "$_id._id",
            "win": "$win"
        },
        "loss": { "$push": "$loss" }
    }},
    { "$project": {
        "_id": "$_id._id",
        "results": { "$setUnion": [ "$_id.win", "$loss" ] }
    }},
    { "$unwind": "$results" },
    { "$group": { 
        "_id": "$results.player", 
        "wins": {"$sum": {"$cond": [
            {"$eq":["$results.type","win"]},1,0
        ]}}, 
        "losses": {"$sum":{"$cond": [
            {"$eq":["$results.type","loss"]},1,0
        ]}}
    }}

])

但简化"是有争议的.对我来说，这只是感觉"就像是在闲逛"并做更多的工作.它当然更传统，因为它只依赖于 $setUnion 来合并数组结果.

But "simplified" is debatable. To me, that just "feels" like it's "chugging around" and doing more work. It certainly is more traditional, as it simply relies on $setUnion to merge the array results.

但是通过稍微改变你的模式，工作"就会失效，如下所示:

But that "work" would be nullified by changing your schema a little, as shown here:

{
    "_id" : ObjectId("531ea2b1fcc997d5cc5cbbc9"),
    "win": [
        {
            "player" : "Player2",
            "type" : "win"
        },
        {
            "player" : "Player4",
            "type" : "win"
        }
    ],
    "loss" : [
        {
            "player" : "Player6",
            "type" : "loss"
        },
        {
            "player" : "Player5",
            "type" : "loss"
        },
    ]
}

这消除了像我们一直在做的那样通过添加type"属性来投影数组内容的需要，并减少了查询，以及完成的工作:

And this removes the need to project the array contents by adding the "type" attribute as we have been doing, and reduces the query, and the work done:

db.collection.aggregate([
    { "$project": {
        "results": { "$setUnion": [ "$win", "$loss" ] }
    }},
    { "$unwind": "$results" },
    { "$group": { 
        "_id": "$results.player", 
        "wins": {"$sum": {"$cond": [
            {"$eq":["$results.type","win"]},1,0
        ]}}, 
        "losses": {"$sum":{"$cond": [
            {"$eq":["$results.type","loss"]},1,0
        ]}}
    }}

])

当然只是改变你的架构如下:

And of course just changing your schema as follows:

{
    "_id" : ObjectId("531ea2b1fcc997d5cc5cbbc9"),
    "results" : [
        {
            "player" : "Player6",
            "type" : "loss"
        },
        {
            "player" : "Player5",
            "type" : "loss"
        },
        {
            "player" : "Player2",
            "type" : "win"
        },
        {
            "player" : "Player4",
            "type" : "win"
        }
    ]
}

这让事情变得非常简单.这可以在 2.6 之前的版本中完成.所以你现在就可以做:

That makes things very easy. And this could be done in versions prior to 2.6. So you could do it right now:

db.collection.aggregate([
    { "$unwind": "$results" },
    { "$group": { 
        "_id": "$results.player", 
        "wins": {"$sum": {"$cond": [
            {"$eq":["$results.type","win"]},1,0
        ]}}, 
        "losses": {"$sum":{"$cond": [
            {"$eq":["$results.type","loss"]},1,0
        ]}}
    }}

])

所以对我来说，如果它是我的应用程序，我会想要上面显示的最后一种形式的架构，而不是你所拥有的.在提供的聚合操作中完成的所有工作(最后一条语句除外)旨在采用现有的模式形式并将其操作为 this 形式，因此很容易运行简单的聚合声明如上图.

So for me, if it was my application, I would want the schema in the last form shown above rather than what you have. All of the work done in the supplied aggregation operations (with exception of the last statement) is aimed at taking the existing schema form and manipulating it into this form, so then it is easy to run the simple aggregation statement as shown above.

由于每个玩家都被标记"了赢/输"属性，因此您始终可以随意访问您的赢家/输家".

As each player is "tagged" with the "win/loss" attribute, you can always just discretely access your "winners/loosers" anyhow.

作为最后一件事.您的 date 是一个字符串.我不喜欢那样.

As a final thing. Your date is a string. I don't like that.

这样做可能是有原因的，但我不明白.如果您需要按 day 进行分组，只需使用适当的 BSON 日期即可轻松完成聚合.然后，您还可以轻松地使用其他时间间隔.

There may have been a reason for doing so but I don't see it. If you need to group by day that is easy to do in aggregation just by using a proper BSON date. You will also then be able to easily work with other time intervals.

因此，如果您确定了日期，并将其设为 start_date，并将duration"替换为 end_time，那么您可以保留一些可以获得持续时间"，通过简单的数学运算 + 您可以通过将这些作为日期值来获得许多额外好处.

So if you fixed the date, and made it the start_date, and replaced "duration" with end_time, then you get to keep something that you can get the "duration" from by simple math + You get lots of extra benefits by having these as a date value instead.

所以这可能会给你一些关于你的架构的思考.

So that may give you some food for thought on your schema.

对于那些感兴趣的人，这里有一些我用来生成工作数据集的代码:

For those who are interested, here is some code I used to generate a working set of data:

// Ye-olde array shuffle
function shuffle(array) {
    var m = array.length, t, i;

    while (m) {

        i = Math.floor(Math.random() * m--);

        t = array[m];
        array[m] = array[i];
        array[i] = t;

    }

    return array;
}


for ( var l=0; l<10000; l++ ) {

    var players = ["Player1","Player2","Player3","Player4"];

    var playlist = shuffle(players);
    for ( var x=0; x<playlist.length; x++ ) { 
        var obj = {  
            player: playlist[x], 
            score: Math.floor(Math.random() * (100000 - 50 + 1)) +50
        }; 

        playlist[x] = obj;
    }

    var rec = { 
        duration: Math.floor(Math.random() * (50000 - 15000 +1)) +15000,
        date: new Date(),
         win: playlist.slice(0,2),
        loss: playlist.slice(2) 
    };  

    db.game.insert(rec);
}

这篇关于mongoDB聚合:基于数组名求和的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

mongoDB聚合:基于数组名求和 [英] mongoDB Aggregation: sum based on array names

问题描述

推荐答案

总结

Summary

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

mongoDB聚合:基于数组名求和 [英] mongoDB Aggregation: sum based on array names

问题描述

推荐答案

总结

Summary

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭