MongoDB $ redact过滤掉数组中的某些元素 [英] MongoDB $redact to filter out some elements of an array

查看:68
本文介绍了MongoDB $ redact过滤掉数组中的某些元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对样本BIOS集合 http://docs.mongodb.org/manual/reference/bios-example-collection/:

I am trying to formulate a query over the sample bios collection http://docs.mongodb.org/manual/reference/bios-example-collection/:

在获得图灵奖之前,检索他们所获得的所有人员及其奖励.

我想出了这个查询:

db.bios.aggregate([
    {$match: {"awards.award" : "Turing Award"}},
    {$project: {"award1": "$awards", "award2": "$awards", "first_name": "$name.first", "last_name": "$name.last"}},
    {$unwind: "$award1"},
    {$match: {"award1.award" : "Turing Award"}},
    {$unwind: "$award2"},
    {$redact: {
        $cond: {
           if: { $eq: [ { $gt: [ "$award1.year", "$award2.year"] }, true]},
           then: "$$KEEP",
           else: "$$PRUNE"
           }
        }
    }
])

这是答案:

/* 0 */
{
    "result" : [ 
    {
        "_id" : 1,
        "award1" : {
            "award" : "Turing Award",
            "year" : 1977,
            "by" : "ACM"
        },
        "award2" : {
            "award" : "W.W. McDowell Award",
            "year" : 1967,
            "by" : "IEEE Computer Society"
        },
        "first_name" : "John",
        "last_name" : "Backus"
    }, 
    {
        "_id" : 1,
        "award1" : {
            "award" : "Turing Award",
            "year" : 1977,
            "by" : "ACM"
        },
        "award2" : {
            "award" : "National Medal of Science",
            "year" : 1975,
            "by" : "National Science Foundation"
        },
        "first_name" : "John",
        "last_name" : "Backus"
    }, 
    {
        "_id" : 4,
        "award1" : {
            "award" : "Turing Award",
            "year" : 2001,
            "by" : "ACM"
        },
        "award2" : {
            "award" : "Rosing Prize",
            "year" : 1999,
            "by" : "Norwegian Data Association"
        },
        "first_name" : "Kristen",
        "last_name" : "Nygaard"
    }, 
    {
        "_id" : 5,
        "award1" : {
            "award" : "Turing Award",
            "year" : 2001,
            "by" : "ACM"
        },
        "award2" : {
            "award" : "Rosing Prize",
            "year" : 1999,
            "by" : "Norwegian Data Association"
        },
        "first_name" : "Ole-Johan",
        "last_name" : "Dahl"
    }
],
"ok" : 1
}

对于此解决方案,我不满意的是我放松了$award2.相反,我很乐意将award2保留为数组,并且只删除那些在award1之后收到的奖项.因此,例如,约翰·巴克(John Backus)的答案应该是:

What I don't like about this solution is that I unwind $award2. Instead, I would be happy to keep award2 as an array, and only remove those awards that were received after award1. So, for instance, the answer for John Backus should be:

{
    "_id" : 1,
    "first_name" : "John",
    "last_name" : "Backus",
    "award1" : {
        "award" : "Turing Award",
        "year" : 1977,
        "by" : "ACM"
    },
    "award2" : [ 
        {
            "award" : "W.W. McDowell Award",
            "year" : 1967,
            "by" : "IEEE Computer Society"
        }, 
        {
            "award" : "National Medal of Science",
            "year" : 1975,
            "by" : "National Science Foundation"
        }
    ]
}

是否可以使用$redact而不执行$unwind: "$award2"来实现?

Is it possible to achieve it with $redact without doing $unwind: "$award2"?

推荐答案

如果您在问题中包含文档的原始状态作为示例,可能会有所帮助,因为这样可以清楚地显示您来自哪里",然后显示为您想去哪里"除了给定的期望输出之外,还有一个目标.

It might have been a little more helpful if you had included the original state of the document as an example in your question as this clearly shows "where you are coming from" and then to "where you want to get to" as a goal in addition to your desired output as given.

这只是一个提示,但看来您是从这样的文档开始的:

That's just a tip, but it seems that you are starting with a document like this:

{
    "_id" : 1,
    "name": { 
        "first" : "John",
        "last" : "Backus"
    },
    "awards" : [
        {
            "award" : "W.W. McDowell Award",
            "year" : 1967,
            "by" : "IEEE Computer Society"
        }, 
        {
            "award" : "National Medal of Science",
            "year" : 1975,
            "by" : "National Science Foundation"
        },
        { 
            "award" : "Turing Award",
            "year" : 1977,
            "by" : "ACM"
        },
        {
            "award" : "Some other award",
            "year" : 1979,
            "by" : "Someone Else"
        }
    ]
}

所以这里的重点是,尽管您可能已经达到> $redact 在这里(这比使用$project进行逻辑条件好,然后使用$match过滤该逻辑匹配要好一些),这可能不是最好的工具您要在此处进行比较.

So the real points here is that while you may haved reached for $redact here (and it is a bit nicer than using $project for a logical condition and then using $match to filter that logical match ) this probably isn't the best tool for the comparison you want to do here.

在继续之前,我只想指出$redact的主要问题.无论您在这里做什么,逻辑(不放松)本质上都是在$$DESCEND上直接"进行比较,以便在任何级别上以"year"的值处理数组元素.

Before moving on I just want to point out the main problem here with $redact. Whatever you do here the logic ( without an unwind ) would be essentially to compare "directly" on $$DESCEND in order to process the the array elements on the value of "year" at whatever level.

该递归也将使"award1"条件无效,因为它具有相同的字段名称.甚至重命名该字段也会破坏逻辑,因为丢失该字段的预计值不会大于测试值.

That recursion is going to invalidate the "award1" condition as well since it has the same field name. Even renaming that field kills the logic since a projected value where it was missing would not be greater than the tested value.

简而言之,$redact被排除在外,因为您不能用适用的逻辑说仅从此处取走".

In a nutshell, $redact is ruled right out since you cannot say "take from here only" with the logic it applies.

备用方法是使用 $map $setDifference 来过滤数组中的内容,如下所示:

The alternate is to use $map and $setDifference to filter contents from the arrays as follows:

db.bios.aggregate([
    { "$match": { "awards.award": "Turing Award" } },
    { "$project": {
        "first_name": "$name.first",
        "last_name": "$name.last",
        "award1": { "$setDifference": [
            { "$map": {
                "input": "$awards",
                "as": "a",
                "in": { "$cond": [
                    { "$eq": [ "$$a.award", "Turing Award" ] },
                    "$$a",
                    false
                ]}
            }},
            [false]
        ]},
        "award2": { "$setDifference": [
            { "$map": {
                "input": "$awards",
                "as": "a",
                "in": { "$cond": [
                    { "$ne": [ "$$a.award", "Turing Award" ] },
                    "$$a",
                    false
                ]}
            }},
            [false]
        ]}
    }},
    { "$unwind": "$award1" },
    { "$project": {
        "first_name": 1,
        "last_name": 1,
        "award1": 1,
        "award2": { "$setDifference": [
            { "$map": {
                "input": "$award2",
                "as": "a",
                "in": { "$cond": [
                     { "$gt": [ "$award1.year", "$$a.year" ] },
                     "$$a",
                     false
                 ]}
            }},
            [false]            
        ]}
    }}
])

确实没有解决" $unwind 在迭代阶段,甚至在这里是第二个$project,因为$map(和$setDifference过滤器)返回的是仍然是数组".因此,$unwind是使数组"成为单个(如果您的条件仅匹配1个元素)条目以进行比较的必要条件.

And there really is no "pretty" way of getting around either the usage of $unwind in the itermediatary stage or even the second $project here, since $map ( and the $setDifference filter ) returns what is "still an array". So the $unwind is necessary to make the "array" a singular ( provided your condition only matches 1 element ) entry for which to use in comparison.

尝试在单个$project中压缩"所有逻辑只会在第二个输出中导致数组数组",因此仍然需要一些展开",但是至少以这种方式展开(希望) 1场比赛并没有那么昂贵,并且可以保持输出整洁.

Trying to "squish" all the logic in a single $project will only result in "arrays of arrays" in the second output, and still some "unwinding" therefore required, but at least this way unwinding the (hopefully) 1 match is not really that costly and keeps the output clean.

但是这里要真正注意的另一件事是,您根本没有真正在聚合"任何东西.这只是文档操作,因此您不妨考虑直接在客户端代码中进行该操作.如以下示例所示:

But the other thing to really note here is that you are not really "aggregating" anything here at all. This is just document manipulation, so you might well consider to just do that manipulation directly in client code. As demonstrated with this shell example:

db.bios.find(
    { "awards.award": "Turing Award" },
    { "name": 1, "awards": 1 }
).forEach(function(doc) {
    doc.first_name = doc.name.first;
    doc.last_name = doc.name.last;
    doc.award1 = doc.awards.filter(function(award) {
        return award.award == "Turing Award"
    })[0];
    doc.award2 = doc.awards.filter(function(award) {
        return doc.award1.year > award.year;
    });
    delete doc.name;
    delete doc.awards;
    printjson(doc);
})


无论如何,两种方法都将输出相同的结果:


At any rate, both approaches will output the same:

{
    "_id" : 1,
    "first_name" : "John",
    "last_name" : "Backus",
    "award1" : {
            "award" : "Turing Award",
            "year" : 1977,
            "by" : "ACM"
    },
    "award2" : [
            {
                    "award" : "W.W. McDowell Award",
                    "year" : 1967,
                    "by" : "IEEE Computer Society"
            },
            {
                    "award" : "National Medal of Science",
                    "year" : 1975,
                    "by" : "National Science Foundation"
            }
    ]
}

这里唯一真正的区别是,通过使用.aggregate()从服务器返回时,"award2"的内容将已经被过滤,这与使用客户端处理方法可能没有太大不同,除非每个文档将要删除的项目都包含一个相当大的列表.

The only real difference here is that by using .aggregate() the content of "award2" will already be filtered when returned from the server, which probably isn't going to be that much different from doing the client processing approach unless the items that would be removed comprises a reasonably large list per document.

为便于记录,此处真正需要对现有聚合管道进行的唯一更改是添加 $group 将数组条目重新组合"到单个文档中:

For the record, the only alteration to your existing aggregation pipeline really required here would be to add a $group to the end to "re-combine" the array entries into a single document:

db.bios.aggregate([
    { "$match": { "awards.award": "Turing Award" } },
    { "$project": {
        "first_name": "$name.first", 
        "last_name": "$name.last",
        "award1": "$awards",
        "award2": "$awards"
    }},
    { "$unwind": "$award1" },
    { "$match": {"award1.award" : "Turing Award" }},
    { "$unwind": "$award2" },
    { "$redact": {
        "$cond": {
             "if": { "$gt": [ "$award1.year", "$award2.year"] },
             "then": "$$KEEP",
             "else": "$$PRUNE"
        }
    }},
    { "$group": {
        "_id": "$_id",
        "first_name": { "$first": "$first_name" },
        "last_name": { "$first": "$last_name" },
        "award1": { "$first": "$award1" },
        "award2": { "$push": "$award2" }
    }}
])

但是再说一遍,这里所有的操作都与所有数组复制"和展开成本"相关.因此,前两种方法中的任何一种都是您真正想要的,以避免发生这种情况.

But then again, there is all that "array duplication" and the "cost of unwind" associated with all the operations here. So either of the first two approaches is what you really want in order to avoid that.

这篇关于MongoDB $ redact过滤掉数组中的某些元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆