MongoDB文档重塑 [英] MongoDB Document Re-shaping

查看:55
本文介绍了MongoDB文档重塑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题来自(通常是我的)仔细阅读关于SO的问题,因此给我自己提出了另一个问题.因此,除了为解决问题而进行的学习练习之外,我发现还弹出了另一个这样的问题.

This question comes out of (as mine usually do) perusing the questions asked on SO and as such, raising another question for myself. So apart from the learning exercise in working towards a solution for a problem, I find that another question pops up, such as this.

原始问题仍然未被OP接受,并且实际上还没有澄清什么 他们" 想要实现.但是我确实以简单的形式给出了我的解释.

The original question as yet remains unaccepted by the OP, and indeed has not been clarified as to what "they" wanted to achieve. But I did give my interpretation, in both the simple and long forms of arriving at a solution.

最后,这个过程让我想知道,考虑该解决方案的 long 形式,下一个(当前预期为2.6)MongoDB版本中是否会引入一些新功能,使用已经引入的其他聚合运算符.

The process, in the end, has left me wondering that considering the long form of the solution, would there be some new feature to be introduced in the next (Currently expecting 2.6) MongoDB release, using the additional aggregation operators that have been introduced.

因此,情况如下:

{
    "tracked_item_type" : "Software",
    "tracked_item_name" : "Word",
    "duration" : 9540
}
{
    "tracked_item_type" : "Software",
    "tracked_item_name" : "Excel",
    "duration" : 4000
}
{
    "tracked_item_type" : "Software",
    "tracked_item_name" : "Notepad",
    "duration" : 4000
}
{
    "tracked_item_type" : "Site",
    "tracked_item_name" : "Facebook",
    "duration" : 7920
}
{
    "tracked_item_type" : "Site",
    "tracked_item_name" : "Twitter",
    "duration" : 5555
}
{
    "tracked_item_type" : "Site",
    "tracked_item_name" : "Digital Blasphemy",
    "duration" : 8000
}

所需结果

每种类型的前两名结果,按总持续时间排序.即使这是一个样本,持续时间也被视为许多项目的 $ sum .

Desired Result

The top two results by each type, ordered by the total duration. Even though this is a small sample, duration is considered to be a $sum of many items.

{ 
    "tracked_item_type": "Site",
    "tracked_item_name": "Digital Blasphemy",
    "duration" : 8000
}
{ 
    "tracked_item_type": "Site",
    "tracked_item_name": "Facebook",
    "duration" : 7920
}
{ 
    "tracked_item_type": "Software",
    "tracked_item_name": "Word",
    "duration" : 9540
}
{ 
    "tracked_item_type": "Software",
    "tracked_item_name": "Notepad",
    "duration" : 4000
}

总体解决方案

这是我解决问题的冗长方法

db.collection.aggregate([

    // Group on the types and "sum" of duration
    {"$group": {
        "_id": {
            "tracked_item_type": "$tracked_item_type",
            "tracked_item_name": "$tracked_item_name"
         },
        "duration": {"$sum": "$duration"}
    }},

    // Sort by type and duration descending
    {"$sort": { "_id.tracked_item_type": 1, "duration": -1 }},

    /* The fun part */

    // Re-shape results to "sites" and "software" arrays 
    {"$group": { 
        "_id": null,
        "sites": {"$push":
            {"$cond": [
                {"$eq": ["$_id.tracked_item_type", "Site" ]},
                { "_id": "$_id", "duration": "$duration" },
                null
            ]}
        },
        "software": {"$push":
            {"$cond": [
                {"$eq": ["$_id.tracked_item_type", "Software" ]},
                { "_id": "$_id", "duration": "$duration" },
                null
            ]}
        }
    }},


    // Remove the null values for "software"
    {"$unwind": "$software"},
    {"$match": { "software": {"$ne": null} }},
    {"$group": { 
        "_id": "$_id",
        "software": {"$push": "$software"}, 
        "sites": {"$first": "$sites"} 
    }},

    // Remove the null values for "sites"
    {"$unwind": "$sites"},
    {"$match": { "sites": {"$ne": null} }},
    {"$group": { 
        "_id": "$_id",
        "software": {"$first": "$software"},
        "sites": {"$push": "$sites"} 
    }},


    // Project out software and limit to the *top* 2 results
    {"$unwind": "$software"},
    {"$project": { 
        "_id": 0,
        "_id": { "_id": "$software._id", "duration": "$software.duration" },
        "sites": "$sites"
    }},
    {"$limit" : 2},


    // Project sites, grouping multiple software per key, requires a sort
    // then limit the *top* 2 results
    {"$unwind": "$sites"},
    {"$group": {
        "_id": { "_id": "$sites._id", "duration": "$sites.duration" },
        "software": {"$push": "$_id" }
    }},
    {"$sort": { "_id.duration": -1 }},
    {"$limit": 2}

])

尚不存在"输出

聚集点不足以得到最终结果.至少以我目前的理解.

The "Not quite there yet" Output

And the point where aggregation falls short of getting to the final result. At least to my current understanding.

{
    "result" : [
        {
            "_id" : {
                "_id" : {
                    "tracked_item_type" : "Site",
                    "tracked_item_name" : "Digital Blasphemy"
                 },
                 "duration" : 8000
           },
            "software" : [
                {
                    "_id" : {
                        "tracked_item_type" : "Software",
                        "tracked_item_name" : "Word"
                    },
                    "duration" : 9540
                },

                {
                    "_id" : {
                        "tracked_item_type" : "Software",
                        "tracked_item_name" : "Notepad"
                    },
                    "duration" : 4000
                }
            ]
        },
        {
            "_id" : {
                "_id" : {
                    "tracked_item_type" : "Site",
                    "tracked_item_name" : "Facebook"
                },
                "duration" : 7920
            },
            "software" : [
                {
                    "_id" : {
                        "tracked_item_type" : "Software",
                        "tracked_item_name" : "Word"
                    },
                    "duration" : 9540
                },
                {
                    "_id" : {
                        "tracked_item_type" : "Software",
                        "tracked_item_name" : "Notepad"
                    },
                    "duration" : 4000
                }
            ]
        }
    ],
    "ok" : 1
}

(对于我而言),这一切都非常合理(无论如何对我而言),可以对结果(虽然不是完整)进行代码中的后处理以进行按摩将其转换为所需的形式.

This all seemed very reasonable (to Me anyway) that the result, while not complete could be post-processed in code in order to massage it into the desired form.

但实际上,这似乎是一种练习,并且引人入胜,这是否可以通过使用任何即将使用的功能进行汇总(或可能使用其他功能)来实现 技术使我无法获得所需的结果表格.

But indeed, it seems an exercise, and a point of intrigue as to whether this could be achieved with the use of any upcoming features for aggregation (or possibly another technique that has eluded me) to get to the desired result form.

因此,您可以随时提出有关如何实现此目标的任何建议/指示.

So feel free to answer with any, suggestions / pointers as to how this could be achieved.

推荐答案

此处是一个汇总,可按持续时间在每个类别中找到前两个(它确实任意打破了联系",这似乎与示例输出一致) ):

Here is an aggregation that finds the top two by duration in each category (it does break "ties" arbitrarily, which seems to be in line with your sample output):

var pregroup = { "$group" : {
        "_id" : {
            "type" : "$tracked_item_type",
            "name" : "$tracked_item_name"
        },
        "duration" : {
            "$sum" : "$duration"
        }
    }
};
var sort = { "$sort" : { "_id.type" : 1, "duration" : -1 } };
var group1 = { "$group" : {
        "_id" : "$_id.type",
        "num1" : {
            "$first" : {
                "name" : "$_id.name",
                "dur" : "$duration"
            }
        },
        "other" : {
            "$push" : {
                "name" : "$_id.name",
                "dur" : "$duration"
            }
        },
    "all" : {
        "$push" : {
            "name" : "$_id.name",
            "dur" : "$duration"
        }
    }
    }
};
var unwind = { "$unwind" : "$other" };
project = {
    "$project" : {
        "keep" : {
            "$ne" : [
                "$num1.name",
                "$other.name"
            ]
        },
        "num1" : 1,
        "all" : 1,
        "other" : 1
    }
};
var match = { "$match" : { "keep" : true } };
var sort2 = { "$sort" : { "_id" : 1, "other.dur" : -1 } };
var group2 = { "$group" : {
        "_id" : "$_id",
        "numberOne" : {
            "$first" : "$num1"
        },
        "numberTwo" : {
            "$first" : "$other"
        },
    "all" : {
        "$first" : "$all"
    }
    }
};
unwind2 = { "$unwind" : "$all" };
project2 = { "$project" : {
    "_id" : 0,
    "tracked_item_type" : "$_id",
    "tracked_item_name" : {
        "$cond" : [
            {
                "$or" : [
                    {
                        "$eq" : [
                            "$all.name",
                            "$numberOne.name"
                        ]
                    },
                    {
                        "$eq" : [
                            "$all.name",
                            "$numberTwo.name"
                        ]
                    }
                ]
            },
            "$all.name",
            null
        ]
    },
    "duration" : {
        "$cond" : [
            {
                "$or" : [
                    {
                        "$eq" : [
                            "$all.name",
                            "$numberOne.name"
                        ]
                    },
                    {
                        "$eq" : [
                            "$all.name",
                            "$numberTwo.name"
                        ]
                    }
                ]
            },
            "$all.dur",
            null
        ]
    }
}
}
match2 = { "$match" : { "tracked_item_name" : { "$ne" : null } } };

使用您的示例数据运行它:

Running this with your sample data:

db.top2.aggregate(pregroup, sort, group1, unwind, project, match, sort2, group2, unwind2, project2, match2).toArray()
[
    {
        "tracked_item_type" : "Software",
        "tracked_item_name" : "Word",
        "duration" : 9540
    },
    {
        "tracked_item_type" : "Software",
        "tracked_item_name" : "Notepad",
        "duration" : 4000
    },
    {
        "tracked_item_type" : "Site",
        "tracked_item_name" : "Digital Blasphemy",
        "duration" : 8000
    },
    {
        "tracked_item_type" : "Site",
        "tracked_item_name" : "Facebook",
        "duration" : 7920
    }
]

这将适用于任意数量的域(不同的跟踪项类型值),并且您无需事先知道其所有名称.但是,将其推广到前三名,前四名,前五名等将为每个附加的最高"N"值增加四个阶段-不太实用或很漂亮.

This will work with arbitrary number of domains (different tracked item type values) and you don't need to know all their names in advance. However, to generalize it to top three, top four, top five, etc. will add four more stages for each additional top "N" value - not very practical or pretty.

投票给这张吉拉票,以获得更自然的前N名"实现"在聚合框架中的功能.

Please vote up this jira ticket to get a more native implementation of "top N" functionality in the aggregation framework.

这篇关于MongoDB文档重塑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆