在MongoDB中找到两个共享键值的文档 [英] Finding two documents in MongoDB that share a key value

查看:46
本文介绍了在MongoDB中找到两个共享键值的文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在MongoDB中收集了大量文档,这些文档中的每一个都有一个名为名称"的键,另一个为类型"的键.我想找到两个同名不同类型的文档,这是简单的MongoDB副本

I have a large collection of documents in MongoDB, each one of those documents has a key called "name", and another key called "type". I would like to find two documents with the same name and different types, a simple MongoDB counterpart of

SELECT ...
FROM table AS t1, table AS t2
WHERE t1.name = t2.name AND t1.type <> t2.type

我可以想象一个人可以使用聚合来做到这一点:但是,集合非常大,处理将花费时间,而我正在寻找一对这样的文档.

I can imagine that one can do this using aggregation: however, the collection is very large, processing it will take time and I'm looking just for one pair of such documents.

推荐答案

虽然我支持一些评论,但我认为您的问题表达方式实际上与您所遇到的特定问题无关,但我还是会采取某种方式在MongoDB类型的解决方案中解释惯用的SQL方式.我认为您的实际解决方案会有所不同,但是您没有向我们提出这个问题,而只是向SQL提出.

While I stand by by comments that I don't think the way you are phrasing your question is actually related to a specific problem you have, I will go someway to explain the idiomatic SQL way in a MongoDB type of solution. I stand on that your actual solution would be different but you haven't presented us with that problem, but only SQL.

因此,请考虑以下文档作为样本集,为清楚起见,删除了此清单中的_id字段:

So consider the following documents as a sample set, removing _id fields in this listing for clarity:

{ "name" : "a", "type" : "b" }
{ "name" : "a", "type" : "c" }
{ "name" : "b", "type" : "c" }
{ "name" : "b", "type" : "a" }
{ "name" : "a", "type" : "b" }
{ "name" : "b", "type" : "c" }
{ "name" : "f", "type" : "e" }
{ "name" : "z", "type" : "z" }
{ "name" : "z", "type" : "z" }

如果对相同的数据运行SQL,则会得到以下结果:

If we ran the SQL presented over the same data we would get this result:

a|b
a|c
a|c
b|c
b|a
b|a
a|b
b|c

我们可以看到2个文档不匹配,然后计算出SQL操作的逻辑.因此,另一种说法是:给定名称为"name"的文档的文档确实在键"type"中具有超过一个可能的值.

We can see that 2 documents do not match, and then work out the logic of the SQL operation. So the other way of saying it is "Which documents given a key of "name" do have more than one possible value in the key "type".

鉴于此,采用mongo方法,我们可以查询不符合条件的商品.这样有效地反转结果:

Given that, taking a mongo approach, we can query for the items that do not match the given condition. So effectively the reverse of the result:

db.sample.aggregate([

    // Store unique documents grouped by the "name"
    {$group: { 
        _id: "$name",
        comp: {
            $addToSet: { 
                name:"$name",
                type: "$type" 
            }
        } 
    }},

    // Unwind the "set" results
    {$unwind: "$comp"},

    // Push the results back to get the unique count
    // *note* you could not have done this with alongside $addtoSet
    {$group: {
        _id: "$_id",
        comp: {
            $push: { 
                name: "$comp.name",
                type: "$comp.type" 
            }
        },
        count: {$sum: 1} 
    }},

    // Match only what was counted once
    {$match: {count: 1}},

    // Unwind the array
    {$unwind: "$comp"},

    // Clean up to "name" and "type" only
    {$project: { _id: 0, name: "$comp.name", type: "$comp.type"}}

])

此操作将产生结果:

{ "name" : "f", "type" : "e" }
{ "name" : "z", "type" : "z" }

现在,为了获得与SQL查询相同的结果,我们将获取这些结果并将其引导到另一个查询中:

Now in order to get the same result as the SQL query we would take those results and channel them into another query:

db.sample.find({$nor: [{ name: "f", type: "e"},{ name: "z", type: "z"}] })

哪个是最终的匹配结果:

Which arrives as the final matching result:

{ "name" : "a", "type" : "b" }
{ "name" : "a", "type" : "c" }
{ "name" : "b", "type" : "c" }
{ "name" : "b", "type" : "a" }
{ "name" : "a", "type" : "b" }
{ "name" : "b", "type" : "c" }

所以这将起作用,但是可能导致不切实际的一件事是比较的文档数量非常大,我们达到了将这些结果压缩为数组的工作极限.

So this will work, however the one thing that may make this impractical is where the number of documents being compared is very large, we hit a working limit on compacting those results down to an array.

在最终查找操作中使用负数也会使它有点受苦,这会强制扫描集合.但是,公平地说,使用相同的前提的SQL查询也是如此.

It also suffers a bit from the use of a negative in the final find operation which would force a scan of the collection. But in all fairness the same could be said of the SQL query that uses the same negative premise.

当然,我没有提到的是,如果结果集出现了相反的变化,并且您匹配了更多结果,则从汇总中排除了项,那么只需反转逻辑即可得出您想要的键.只需如下更改$ match:

Of course what I did not mention is that if the result set goes the other way around and you are matching more results in the excluded items from the aggregate, then just reverse the logic to get the keys that you want. Simply change $match as follows:

{$match: {$gt: 1}}

这将是结果,也许不是实际的文档,但这是结果.因此,您不需要其他查询即可匹配否定情况.

And that will be the result, maybe not the actual documents but it is a result. So you don't need another query to match the negative cases.

最终,这是我的错,因为我太关注惯用语翻译了,所以我没有阅读您问题的最后一行,在做什么中说您正在寻找一个文档.

And, ultimately this was my fault because I was so focused on the idiomatic translation that I did not read the last line in your question, where to do say that you were looking for one document.

当然,当前,如果该结果的大小大于16MB,则您将陷入困境.至少直到 2.6 版本,聚合操作的结果为

Of course, currently if that result size is larger than 16MB then you are stuck. At least until the 2.6 release, where the results of aggregation operations are a cursor, so you can iterate that like a .find().

2.6 中也引入了

Also introduced in 2.6 is the $size operator which is used to find the size of an array in the document. So this would help to remove the second $unwind and $group that are used in order to get the length of the set. This alters the query to a faster form:

db.sample.aggregate([
    {$group: { 
        _id: "$name",
        comp: {
            $addToSet: { 
                name:"$name",
                type: "$type"
            }
        } 
    }},
    {$project: { 
        comp: 1,
        count: {$size: "$comp"} 
    }},
    {$match: {count: {$gt: 1}}},
    {$unwind: "$comp"},
    {$project: { _id: 0, name: "$comp.name", type: "$comp.type"}}
])

如果您仅为个人使用或开发/测试而这样做,那么MongoDB 2.6.0-rc0当前可用.

And MongoDB 2.6.0-rc0 is currently available if you are doing this just for personal use, or development/testing.

故事的道德.是的,您可以做到这一点,但是您真的想要还是需要这样做呢?然后可能不会,如果您对特定业务案例提出了不同的问题,则可能会得到不同的答案.但是话又说回来,这可能完全符合您的需求.

Moral of the story. Yes you can do it, But do you really want or need to do it that way? Then probably not, and if you asked a different question about the specific business case, you may get a different answer. But then again this may be exactly right for what you want.

值得一提的是,当您查看SQL的结果时,如果您未对这些值使用DISTINCT,则会由于其他可用的类型选项而错误地重复几个项目.或实质上是另一个分组.但这就是此过程使用MongoDB产生的结果.

Worthwhile to mention that when you look at the results from the SQL, it will erroneously duplicate several items due to the other available type options if you didn't use a DISTINCT for those values or essentially another grouping. But that is the result that was being produced by this process using MongoDB.

这是当前2.4.x版本的Shell中聚合的输出:

This is the output of the aggregate in the shell from current 2.4.x versions:

{
    "result" : [
            {
                    "name" : "f",
                    "type" : "e"
            },
            {
                    "name" : "z",
                    "type" : "z"
            }
    ],
    "ok" : 1
}

这样做是为了使var作为第二个查找中的$ nor条件的参数传递,如下所示:

So do this to get a var to pass as the argument to the $nor condition in the second find, like this:

var cond = db.sample.aggregate([ .....

db.sample.find({$nor: cond.result })

您应该得到相同的结果.否则,请咨询您的司机.

And you should get the same results. Otherwise consult your driver.

这篇关于在MongoDB中找到两个共享键值的文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆