MongoDB 文档重塑 [英] MongoDB Document Re-shaping

查看:12
本文介绍了MongoDB 文档重塑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题来自(正如我通常所做的那样)仔细阅读关于 SO 提出的问题,因此,为我自己提出了另一个问题.因此,除了学习练习,我发现另一个问题会出现,比如这个.

This question comes out of (as mine usually do) perusing the questions asked on SO and as such, raising another question for myself. So apart from the learning exercise in working towards a solution for a problem, I find that another question pops up, such as this.

原始问题仍未被 OP 接受,并且确实没有澄清 他们"想要实现.但我确实给出了我的解释,以简单两种形式得出解决方案.

The original question as yet remains unaccepted by the OP, and indeed has not been clarified as to what "they" wanted to achieve. But I did give my interpretation, in both the simple and long forms of arriving at a solution.

最后,这个过程让我想知道,考虑到解决方案的long形式,在下一个(目前预计为 2.6)MongoDB 版本中是否会引入一些新功能,使用已引入的其他聚合运算符.

The process, in the end, has left me wondering that considering the long form of the solution, would there be some new feature to be introduced in the next (Currently expecting 2.6) MongoDB release, using the additional aggregation operators that have been introduced.

所以情况如下:

{
    "tracked_item_type" : "Software",
    "tracked_item_name" : "Word",
    "duration" : 9540
}
{
    "tracked_item_type" : "Software",
    "tracked_item_name" : "Excel",
    "duration" : 4000
}
{
    "tracked_item_type" : "Software",
    "tracked_item_name" : "Notepad",
    "duration" : 4000
}
{
    "tracked_item_type" : "Site",
    "tracked_item_name" : "Facebook",
    "duration" : 7920
}
{
    "tracked_item_type" : "Site",
    "tracked_item_name" : "Twitter",
    "duration" : 5555
}
{
    "tracked_item_type" : "Site",
    "tracked_item_name" : "Digital Blasphemy",
    "duration" : 8000
}

期望的结果

每种类型的前两个结果,按总时长排序.尽管这是一个样本,但持续时间被认为是许多项目的 $sum.

Desired Result

The top two results by each type, ordered by the total duration. Even though this is a small sample, duration is considered to be a $sum of many items.

{ 
    "tracked_item_type": "Site",
    "tracked_item_name": "Digital Blasphemy",
    "duration" : 8000
}
{ 
    "tracked_item_type": "Site",
    "tracked_item_name": "Facebook",
    "duration" : 7920
}
{ 
    "tracked_item_type": "Software",
    "tracked_item_name": "Word",
    "duration" : 9540
}
{ 
    "tracked_item_type": "Software",
    "tracked_item_name": "Notepad",
    "duration" : 4000
}

聚合解决方案

这是我冗长解决问题的方法

db.collection.aggregate([

    // Group on the types and "sum" of duration
    {"$group": {
        "_id": {
            "tracked_item_type": "$tracked_item_type",
            "tracked_item_name": "$tracked_item_name"
         },
        "duration": {"$sum": "$duration"}
    }},

    // Sort by type and duration descending
    {"$sort": { "_id.tracked_item_type": 1, "duration": -1 }},

    /* The fun part */

    // Re-shape results to "sites" and "software" arrays 
    {"$group": { 
        "_id": null,
        "sites": {"$push":
            {"$cond": [
                {"$eq": ["$_id.tracked_item_type", "Site" ]},
                { "_id": "$_id", "duration": "$duration" },
                null
            ]}
        },
        "software": {"$push":
            {"$cond": [
                {"$eq": ["$_id.tracked_item_type", "Software" ]},
                { "_id": "$_id", "duration": "$duration" },
                null
            ]}
        }
    }},


    // Remove the null values for "software"
    {"$unwind": "$software"},
    {"$match": { "software": {"$ne": null} }},
    {"$group": { 
        "_id": "$_id",
        "software": {"$push": "$software"}, 
        "sites": {"$first": "$sites"} 
    }},

    // Remove the null values for "sites"
    {"$unwind": "$sites"},
    {"$match": { "sites": {"$ne": null} }},
    {"$group": { 
        "_id": "$_id",
        "software": {"$first": "$software"},
        "sites": {"$push": "$sites"} 
    }},


    // Project out software and limit to the *top* 2 results
    {"$unwind": "$software"},
    {"$project": { 
        "_id": 0,
        "_id": { "_id": "$software._id", "duration": "$software.duration" },
        "sites": "$sites"
    }},
    {"$limit" : 2},


    // Project sites, grouping multiple software per key, requires a sort
    // then limit the *top* 2 results
    {"$unwind": "$sites"},
    {"$group": {
        "_id": { "_id": "$sites._id", "duration": "$sites.duration" },
        "software": {"$push": "$_id" }
    }},
    {"$sort": { "_id.duration": -1 }},
    {"$limit": 2}

])

还没有"输出

聚合达不到最终结果的点.至少就我目前的理解而言.

The "Not quite there yet" Output

And the point where aggregation falls short of getting to the final result. At least to my current understanding.

{
    "result" : [
        {
            "_id" : {
                "_id" : {
                    "tracked_item_type" : "Site",
                    "tracked_item_name" : "Digital Blasphemy"
                 },
                 "duration" : 8000
           },
            "software" : [
                {
                    "_id" : {
                        "tracked_item_type" : "Software",
                        "tracked_item_name" : "Word"
                    },
                    "duration" : 9540
                },

                {
                    "_id" : {
                        "tracked_item_type" : "Software",
                        "tracked_item_name" : "Notepad"
                    },
                    "duration" : 4000
                }
            ]
        },
        {
            "_id" : {
                "_id" : {
                    "tracked_item_type" : "Site",
                    "tracked_item_name" : "Facebook"
                },
                "duration" : 7920
            },
            "software" : [
                {
                    "_id" : {
                        "tracked_item_type" : "Software",
                        "tracked_item_name" : "Word"
                    },
                    "duration" : 9540
                },
                {
                    "_id" : {
                        "tracked_item_type" : "Software",
                        "tracked_item_name" : "Notepad"
                    },
                    "duration" : 4000
                }
            ]
        }
    ],
    "ok" : 1
}

这一切似乎都非常合理(无论如何对我来说)结果,虽然不是完整,但可以在代码中后处理以便按摩 将其转换为所需的形式.

This all seemed very reasonable (to Me anyway) that the result, while not complete could be post-processed in code in order to massage it into the desired form.

但事实上,这似乎是一种练习,也是一个有趣的点,即是否可以通过使用任何即将推出的功能进行聚合(或可能是其他功能)来实现我一直没有掌握的技术)来获得所需的结果形式.

But indeed, it seems an exercise, and a point of intrigue as to whether this could be achieved with the use of any upcoming features for aggregation (or possibly another technique that has eluded me) to get to the desired result form.

因此,请随时就如何实现这一目标提出任何建议/指示.

So feel free to answer with any, suggestions / pointers as to how this could be achieved.

推荐答案

这是一个聚合,它在每个类别中按持续时间查找前两个(它确实任意打破关系",这似乎与您的示例输出一致):

Here is an aggregation that finds the top two by duration in each category (it does break "ties" arbitrarily, which seems to be in line with your sample output):

var pregroup = { "$group" : {
        "_id" : {
            "type" : "$tracked_item_type",
            "name" : "$tracked_item_name"
        },
        "duration" : {
            "$sum" : "$duration"
        }
    }
};
var sort = { "$sort" : { "_id.type" : 1, "duration" : -1 } };
var group1 = { "$group" : {
        "_id" : "$_id.type",
        "num1" : {
            "$first" : {
                "name" : "$_id.name",
                "dur" : "$duration"
            }
        },
        "other" : {
            "$push" : {
                "name" : "$_id.name",
                "dur" : "$duration"
            }
        },
    "all" : {
        "$push" : {
            "name" : "$_id.name",
            "dur" : "$duration"
        }
    }
    }
};
var unwind = { "$unwind" : "$other" };
project = {
    "$project" : {
        "keep" : {
            "$ne" : [
                "$num1.name",
                "$other.name"
            ]
        },
        "num1" : 1,
        "all" : 1,
        "other" : 1
    }
};
var match = { "$match" : { "keep" : true } };
var sort2 = { "$sort" : { "_id" : 1, "other.dur" : -1 } };
var group2 = { "$group" : {
        "_id" : "$_id",
        "numberOne" : {
            "$first" : "$num1"
        },
        "numberTwo" : {
            "$first" : "$other"
        },
    "all" : {
        "$first" : "$all"
    }
    }
};
unwind2 = { "$unwind" : "$all" };
project2 = { "$project" : {
    "_id" : 0,
    "tracked_item_type" : "$_id",
    "tracked_item_name" : {
        "$cond" : [
            {
                "$or" : [
                    {
                        "$eq" : [
                            "$all.name",
                            "$numberOne.name"
                        ]
                    },
                    {
                        "$eq" : [
                            "$all.name",
                            "$numberTwo.name"
                        ]
                    }
                ]
            },
            "$all.name",
            null
        ]
    },
    "duration" : {
        "$cond" : [
            {
                "$or" : [
                    {
                        "$eq" : [
                            "$all.name",
                            "$numberOne.name"
                        ]
                    },
                    {
                        "$eq" : [
                            "$all.name",
                            "$numberTwo.name"
                        ]
                    }
                ]
            },
            "$all.dur",
            null
        ]
    }
}
}
match2 = { "$match" : { "tracked_item_name" : { "$ne" : null } } };

使用您的示例数据运行:

Running this with your sample data:

db.top2.aggregate(pregroup, sort, group1, unwind, project, match, sort2, group2, unwind2, project2, match2).toArray()
[
    {
        "tracked_item_type" : "Software",
        "tracked_item_name" : "Word",
        "duration" : 9540
    },
    {
        "tracked_item_type" : "Software",
        "tracked_item_name" : "Notepad",
        "duration" : 4000
    },
    {
        "tracked_item_type" : "Site",
        "tracked_item_name" : "Digital Blasphemy",
        "duration" : 8000
    },
    {
        "tracked_item_type" : "Site",
        "tracked_item_name" : "Facebook",
        "duration" : 7920
    }
]

这将适用于任意数量的域(不同的跟踪项目类型值),您无需提前知道它们的所有名称.但是,将其概括为前三、前四、前五等,将为每个额外的前N"值增加四个阶段——不是很实用或漂亮.

This will work with arbitrary number of domains (different tracked item type values) and you don't need to know all their names in advance. However, to generalize it to top three, top four, top five, etc. will add four more stages for each additional top "N" value - not very practical or pretty.

投票给这张jira票以获得top N"的更本机实现" 聚合框架中的功能.

Please vote up this jira ticket to get a more native implementation of "top N" functionality in the aggregation framework.

这篇关于MongoDB 文档重塑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆