在 MongoDB 中查找共享键值的两个文档 [英] Finding two documents in MongoDB that share a key value

查看:9
本文介绍了在 MongoDB 中查找共享键值的两个文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 MongoDB 中有大量文档,每个文档都有一个名为name"的键和另一个名为type"的键.我想找到两个具有同名不同类型的文档,这是一个简单的 MongoDB 副本

I have a large collection of documents in MongoDB, each one of those documents has a key called "name", and another key called "type". I would like to find two documents with the same name and different types, a simple MongoDB counterpart of

SELECT ...
FROM table AS t1, table AS t2
WHERE t1.name = t2.name AND t1.type <> t2.type

我可以想象使用聚合可以做到这一点:但是,集合非常大,处理它需要时间,我正在寻找一对这样的文档.

I can imagine that one can do this using aggregation: however, the collection is very large, processing it will take time and I'm looking just for one pair of such documents.

推荐答案

虽然我坚持认为您提出问题的方式实际上与您遇到的特定问题无关,但我会采取某种方式在 MongoDB 类型的解决方案中解释惯用的 SQL 方式.我认为您的实际解决方案会有所不同,但您没有向我们提出这个问题,而只是向我们提出了 SQL.

While I stand by by comments that I don't think the way you are phrasing your question is actually related to a specific problem you have, I will go someway to explain the idiomatic SQL way in a MongoDB type of solution. I stand on that your actual solution would be different but you haven't presented us with that problem, but only SQL.

因此,将以下文档视为样本集,为了清楚起见,删除了此列表中的 _id 字段:

So consider the following documents as a sample set, removing _id fields in this listing for clarity:

{ "name" : "a", "type" : "b" }
{ "name" : "a", "type" : "c" }
{ "name" : "b", "type" : "c" }
{ "name" : "b", "type" : "a" }
{ "name" : "a", "type" : "b" }
{ "name" : "b", "type" : "c" }
{ "name" : "f", "type" : "e" }
{ "name" : "z", "type" : "z" }
{ "name" : "z", "type" : "z" }

如果我们对相同的数据运行 SQL,我们会得到以下结果:

If we ran the SQL presented over the same data we would get this result:

a|b
a|c
a|c
b|c
b|a
b|a
a|b
b|c

我们可以看到2个文档不匹配,然后算出SQL操作的逻辑.因此,另一种说法是哪些文档给定了名称"键确实在键类型"中具有多个一个可能的值.

We can see that 2 documents do not match, and then work out the logic of the SQL operation. So the other way of saying it is "Which documents given a key of "name" do have more than one possible value in the key "type".

鉴于此,采用 mongo 方法,我们可以查询 符合给定条件的项目.如此有效地反转结果:

Given that, taking a mongo approach, we can query for the items that do not match the given condition. So effectively the reverse of the result:

db.sample.aggregate([

    // Store unique documents grouped by the "name"
    {$group: { 
        _id: "$name",
        comp: {
            $addToSet: { 
                name:"$name",
                type: "$type" 
            }
        } 
    }},

    // Unwind the "set" results
    {$unwind: "$comp"},

    // Push the results back to get the unique count
    // *note* you could not have done this with alongside $addtoSet
    {$group: {
        _id: "$_id",
        comp: {
            $push: { 
                name: "$comp.name",
                type: "$comp.type" 
            }
        },
        count: {$sum: 1} 
    }},

    // Match only what was counted once
    {$match: {count: 1}},

    // Unwind the array
    {$unwind: "$comp"},

    // Clean up to "name" and "type" only
    {$project: { _id: 0, name: "$comp.name", type: "$comp.type"}}

])

这个操作会产生结果:

{ "name" : "f", "type" : "e" }
{ "name" : "z", "type" : "z" }

现在为了获得与 SQL 查询相同的结果,我们将获取这些结果并将它们引导到另一个查询中:

Now in order to get the same result as the SQL query we would take those results and channel them into another query:

db.sample.find({$nor: [{ name: "f", type: "e"},{ name: "z", type: "z"}] })

作为最终匹配结果到达:

Which arrives as the final matching result:

{ "name" : "a", "type" : "b" }
{ "name" : "a", "type" : "c" }
{ "name" : "b", "type" : "c" }
{ "name" : "b", "type" : "a" }
{ "name" : "a", "type" : "b" }
{ "name" : "b", "type" : "c" }

所以这会起作用,但是可能使这不切实际的一件事是当被比较的文档数量非常大时,我们在将这些结果压缩为数组时遇到了工作限制.

So this will work, however the one thing that may make this impractical is where the number of documents being compared is very large, we hit a working limit on compacting those results down to an array.

在最终的查找操作中使用 negative 也会带来一些影响,这会强制扫描集合.但平心而论,对于使用相同 否定 前提的 SQL 查询也是如此.

It also suffers a bit from the use of a negative in the final find operation which would force a scan of the collection. But in all fairness the same could be said of the SQL query that uses the same negative premise.

当然,我没有提到的是,如果结果集相反,并且您匹配 more 结果从聚合中排除项目,那么只需反转逻辑即可获得你想要的钥匙.只需将 $match 更改如下:

Of course what I did not mention is that if the result set goes the other way around and you are matching more results in the excluded items from the aggregate, then just reverse the logic to get the keys that you want. Simply change $match as follows:

{$match: {$gt: 1}}

这将是结果,也许不是实际的文件,但它是一个结果.所以你不需要另一个查询来匹配否定的情况.

And that will be the result, maybe not the actual documents but it is a result. So you don't need another query to match the negative cases.

而且,归根结底,这是我的错,因为我太专注于惯用的翻译,以至于我没有阅读你问题的最后一行,在哪里说您正在寻找一个文档.

And, ultimately this was my fault because I was so focused on the idiomatic translation that I did not read the last line in your question, where to do say that you were looking for one document.

当然,目前如果结果大小大于 16MB,那么您将陷入困境.至少在 2.6 版本之前,聚合操作的结果是 cursor,所以你可以像 .find() 一样迭代它.

Of course, currently if that result size is larger than 16MB then you are stuck. At least until the 2.6 release, where the results of aggregation operations are a cursor, so you can iterate that like a .find().

2.6 中还引入了 $size 运算符,用于查找文档中数组的大小.因此,这将有助于删除用​​于获取集合长度的第二个 $unwind$group.这会将查询更改为更快的形式:

Also introduced in 2.6 is the $size operator which is used to find the size of an array in the document. So this would help to remove the second $unwind and $group that are used in order to get the length of the set. This alters the query to a faster form:

db.sample.aggregate([
    {$group: { 
        _id: "$name",
        comp: {
            $addToSet: { 
                name:"$name",
                type: "$type"
            }
        } 
    }},
    {$project: { 
        comp: 1,
        count: {$size: "$comp"} 
    }},
    {$match: {count: {$gt: 1}}},
    {$unwind: "$comp"},
    {$project: { _id: 0, name: "$comp.name", type: "$comp.type"}}
])

如果您只是为了个人使用或开发/测试而这样做,那么目前可以使用 MongoDB 2.6.0-rc0.

And MongoDB 2.6.0-rc0 is currently available if you are doing this just for personal use, or development/testing.

故事的寓意.是的,你可以这样做,但是真的想要或需要那样做吗?那么可能不会,如果您针对特定业务案例提出不同的问题,您可能会得到不同的答案.但是话又说回来,这可能完全符合您的要求.

Moral of the story. Yes you can do it, But do you really want or need to do it that way? Then probably not, and if you asked a different question about the specific business case, you may get a different answer. But then again this may be exactly right for what you want.

值得一提的是,当您查看 SQL 的结果时,如果您不使用 DISTINCT 用于这些值或本质上是另一个分组.但这就是这个过程使用 MongoDB 产生的结果.

Worthwhile to mention that when you look at the results from the SQL, it will erroneously duplicate several items due to the other available type options if you didn't use a DISTINCT for those values or essentially another grouping. But that is the result that was being produced by this process using MongoDB.

这是当前 2.4.x 版本的 shell 中聚合的输出:

This is the output of the aggregate in the shell from current 2.4.x versions:

{
    "result" : [
            {
                    "name" : "f",
                    "type" : "e"
            },
            {
                    "name" : "z",
                    "type" : "z"
            }
    ],
    "ok" : 1
}

所以这样做是为了让一个 var 作为参数传递给第二个查找中的 $nor 条件,如下所示:

So do this to get a var to pass as the argument to the $nor condition in the second find, like this:

var cond = db.sample.aggregate([ .....

db.sample.find({$nor: cond.result })

你应该得到相同的结果.否则请咨询您的司机.

And you should get the same results. Otherwise consult your driver.

这篇关于在 MongoDB 中查找共享键值的两个文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆