MongoDB嵌套数组交集查询 [英] MongoDB Nested Array Intersection Query

查看:84
本文介绍了MongoDB嵌套数组交集查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

,并先谢谢您的帮助. 我有一个像这样的mongoDB数据库:

and thank you in advance for your help. I have a mongoDB database structured like this:

{
  '_id' : objectID(...),

  'userID' : id,

  'movies' : [{

       'movieID' : movieID,

       'rating' : rating
   }]
 }

我的问题是:

我想搜索一个具有'userID'的特定用户:例如,获取全部是电影,则3,然后我想获取具有至少15个或更多电影且具有相同'movieID的所有其他用户',然后与该组一起,我只选择具有相似的那15部电影并选择一个额外的"movieID"的用户.

I want to search for a specific user that has 'userID' : 3, for example, get all is movies, then i want to get all the other users that have at least, 15 or more movies with the same 'movieID', then with that group i wanna select only the users that have those 15 movies in similarity and have one extra 'movieID' that i choose.

我已经尝试了聚合,但是失败了,如果我执行单个查询,例如从某个用户那里获取所有用户的电影,则循环播放每个用户的电影并进行比较需要花费很多时间.

I already tried aggregation, but failed, and if i do single queries like getting all the users movies from a user, the cycling every user movie and comparing it takes a bunch of time.

有任何想法吗?

谢谢

推荐答案

使用聚合框架有两种方法可以实现此目的

There are a couple of ways to do this using the aggregation framework

仅是一组简单的数据,例如:

Just a simple set of data for example:

{
    "_id" : ObjectId("538181738d6bd23253654690"),
    "movies": [
        { "_id": 1, "rating": 5 },
        { "_id": 2, "rating": 6 },
        { "_id": 3, "rating": 7 }
    ]
},
{
    "_id" : ObjectId("538181738d6bd23253654691"),
    "movies": [
        { "_id": 1, "rating": 5 },
        { "_id": 4, "rating": 6 },
        { "_id": 2, "rating": 7 }
    ]
},
{
    "_id" : ObjectId("538181738d6bd23253654692"),
    "movies": [
        { "_id": 2, "rating": 5 },
        { "_id": 5, "rating": 6 },
        { "_id": 6, "rating": 7 }
    ]
}

以第一个用户"为例,现在您要查找其他两个用户中是否有至少两个相同的电影.

Using the first "user" as an example, now you want to find if any of the other two users have at least two of the same movies.

对于MongoDB 2.6及更高版本,您只需使用 运算符以及 运算符:

For MongoDB 2.6 and upwards you can simply use the $setIntersection operator along with the $size operator:

db.users.aggregate([

    // Match the possible documents to reduce the working set
    { "$match": {
        "_id": { "$ne": ObjectId("538181738d6bd23253654690") },
        "movies._id": { "$in": [ 1, 2, 3 ] },
        "$and": [
            { "movies": { "$not": { "$size": 1 } } }
        ]
    }},

    // Project a copy of the document if you want to keep more than `_id`
    { "$project": {
        "_id": {
            "_id": "$_id",
            "movies": "$movies"
        },
        "movies": 1,
    }},

    // Unwind the array
    { "$unwind": "$movies" },

    // Build the array back with just `_id` values
    { "$group": {
        "_id": "$_id",
        "movies": { "$push": "$movies._id" }
    }},

    // Find the "set intersection" of the two arrays
    { "$project": {
        "movies": {
            "$size": {
                "$setIntersection": [
                   [ 1, 2, 3 ],
                   "$movies"
                ]
            }
        }
    }},

    // Filter the results to those that actually match
    { "$match": { "movies": { "$gte": 2 } } }

])

在不具有这些运算符的MongoDB的早期版本中,仍然可以执行以下操作:

This is still possible in earlier versions of MongoDB that do not have those operators, just using a few more steps:

db.users.aggregate([

    // Match the possible documents to reduce the working set
    { "$match": {
        "_id": { "$ne": ObjectId("538181738d6bd23253654690") },
        "movies._id": { "$in": [ 1, 2, 3 ] },
        "$and": [
            { "movies": { "$not": { "$size": 1 } } }
        ]
    }},

    // Project a copy of the document along with the "set" to match
    { "$project": {
        "_id": {
            "_id": "$_id",
            "movies": "$movies"
        },
        "movies": 1,
        "set": { "$cond": [ 1, [ 1, 2, 3 ], 0 ] }
    }},

    // Unwind both those arrays
    { "$unwind": "$movies" },
    { "$unwind": "$set" },

    // Group back the count where both `_id` values are equal
    { "$group": {
        "_id": "$_id",
        "movies": {
           "$sum": {
               "$cond":[
                   { "$eq": [ "$movies._id", "$set" ] },
                   1,
                   0
               ]
           }
        } 
    }},

    // Filter the results to those that actually match
    { "$match": { "movies": { "$gte": 2 } } }
])


详细信息

可能需要一点时间,所以我们可以看一下每个阶段并将其分解以查看它们的作用.


In Detail

That may be a bit to take in, so we can take a look at each stage and break those down to see what they are doing.

$ match :您不想对集合中的每个文档进行操作,因此即使有更多工作要做,这也有机会删除不匹配的项目完全.因此,显而易见的事情是排除相同的用户",然后仅匹配具有至少与该用户"相同的电影之一的文档.

$match : You do not want to operate on every document in the collection so this is an opportunity to remove the items that are not possibly matches even if there still is more work to do to find the exact ones. So the obvious things are to exclude the same "user" and then only match the documents that have at least one of the same movies as was found for that "user".

下一个有意义的事情是,当您想匹配n条目时,只有电影"数组大于n-1的文档才可能实际包含匹配项.在这里使用 $and 看起来很有趣,不需要特别说明,但是如果所需的匹配项是4,则该语句的实际部分将如下所示:

The next thing that makes sense is to consider that when you want to match n entries then only documents that have a "movies" array that is larger than n-1 can possibly actually contain matches. The use of $and here looks funny and is not required specifically, but if the required matches were 4 then that actual part of the statement would look like this:

        "$and": [
            { "movies": { "$not": { "$size": 1 } } },
            { "movies": { "$not": { "$size": 2 } } },
            { "movies": { "$not": { "$size": 3 } } }
        ]

因此,您基本上可以排除"长度不足够具有n匹配项的数组.请注意,此 $size 查询表单中的运算符不同于 $size 用于聚合框架.例如,无法将其与不等式运算符(例如 $gt )一起使用,因为它的目的是专门匹配请求的大小".因此,此查询表单可以指定所有小于该尺寸的大小.

So you basically "rule out" arrays that are not possibly long enough to have n matches. Noting here that this $size operator in the query form is different to $size for the aggregation framework. There is no way for example to use this with an inequality operator such as $gt is it's purpose is to specifically match the requested "size". Hence this query form to specify all of the possible sizes that are less than.

$ project :此语句中有一些用途,其中某些用途因您拥有的MongoDB版本而有所不同.首先,并且可选地,将文档副本保留在_id值下,以使其余步骤不会修改这些字段.另一部分是将电影"数组保留在文档顶部,作为下一阶段的副本.

$project : There are a few purposes in this statement, of which some differ depending on the MongoDB version you have. Firstly, and optionally, a document copy is being kept under the _id value so that these fields are not modified by the rest of the steps. The other part here is keeping the "movies" array at the top of the document as a copy for the next stage.

针对2.6之前的版本提供的版本中还发生了什么,就是有一个额外的数组,表示要匹配的电影"的_id值.此处使用 $cond 运算符只是一种创建数组的文字"表示形式的方法.有趣的是,MongoDB 2.6引入了一个称为 $literal 的运算符,可以完全做到这一点,而无需我们在此处使用 $cond 的有趣方式.

What is also happening in the version presented for pre 2.6 versions is there is an additional array representing the _id values for the "movies" to match. The usage of the $cond operator here is just a way of creating a "literal" representation of the array. Funny enough, MongoDB 2.6 introduces an operator known as $literal to do exactly this without the funny way we are using $cond right here.

$ unwind :要进一步做任何事情,都需要解开Movies数组,因为这是为需要匹配的条目隔离现有的_id值的唯一方法设置".因此,对于2.6之前的版本,您需要展开"存在的两个数组.

$unwind : To do anything further the movies array needs to be unwound as in either case it is the only way to isolate the existing _id values for the entries that need to be matched against the "set". So for the pre 2.6 version you need to "unwind" both of the arrays that are present.

$ group :对于MongoDB 2.6及更高版本,您只是分组回到仅包含电影的_id值且未删除评分"的数组.

$group : For MongoDB 2.6 and greater you are just grouping back to an array that only contains the _id values of the movies with the "ratings" removed.

在2.6之前的版本中,由于所有值都是并排"显示的(并且重复很多),因此您需要对两个值进行比较,以了解它们是否相同.在true的地方,这告诉 $cond 运算符语句返回条件为false10值.这直接通过 $sum 返回,以将数组中匹配元素的总数总计为所需的集合".

Pre 2.6 since all values are presented "side by side" ( and with lots of duplication ) you are doing a comparison of the two values to see if they are the same. Where that is true, this tells the $cond operator statement to return a value of 1 or 0 where the condition is false. This is directly passed back through $sum to total up the number of matching elements in the array to the required "set".

$ project :与MongoDB 2.6和更高版本不同的是,由于您已推回电影" _id值数组,因此您将使用 $setIntersection 直接比较这些数组.结果是数组包含相同的元素,然后将其包装在 $size 运算符中,以确定在该匹配集中返回了多少个元素.

$project: Where this is the different part for MongoDB 2.6 and greater is that since you have pushed back an array of the "movies" _id values you are then using $setIntersection to directly compare those arrays. As the result of this is an array containing the elements that are the same, this is then wrapped in a $size operator in order to determine how many elements were returned in that matching set.

$ match :这是此处已实现的最后阶段,它执行的明确步骤是仅匹配相交元素数大于或等于所需数量的那些文档.

$match: Is the final stage that has been implemented here which does the clear step of matching only those documents whose count of intersecting elements was greater than or equal to the required number.

基本上,这就是您的操作方式. 2.6之前的版本比较笨拙,并且由于扩展是通过复制集合中所有可能值所找到的每个数组成员来完成的,因此将需要更多的内存,但这仍然是一种有效的方法.

That is basically how you do it. Prior to 2.6 is a bit clunkier and will require a bit more memory due to the expansion that is done by duplicating each array member that is found by all of the possible values of the set, but it still is a valid way to do this.

所有您需要做的就是应用更大的n匹配值来满足您的条件,当然要确保您的原始用户匹配项具有所需的n可能性.否则,只需根据用户"的电影"数组的长度在n-1上生成此代码即可.

All you need to do is apply this with the greater n matching values to meet your conditions, and of course make sure your original user match has the required n possibilities. Otherwise just generate this on n-1 from the length of the "user's" array of "movies".

这篇关于MongoDB嵌套数组交集查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆