mongodb聚合框架按嵌套文档匹配 [英] mongodb aggregation framework match by nested documents

查看:57
本文介绍了mongodb聚合框架按嵌套文档匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下文件清单:

{
    "_id" : "Tvq579754r",
    "name": "Tom",
    "forms": {
           "PreOp":{
             "status":"closed"          
           },

           "Alert":{
             "status":"closed"          
           },

           "City":{
              "status":"closed"         
           },

          "Country":{
             "status":"closed"          
          } 
    }
},
....
{
    "_id" : "Tvq444454j",
    "name": "Jim",
    "forms": {
          "Jorney":{
             "status":"closed"          
           },

          "Women":{
             "status":"void"            
          },

         "Child":{
            "status":"closed"           
         },

         "Farm":{
           "status":"closed"            
         }  
     }
}

我想通过它们的状态"字段("forms.name_of_form.status")过滤它们.我需要获取所有不具有"forms.name_of_form.status"等于"void"的文档.

I want to filter them by their 'status' field('forms.name_of_form.status'). I need fetch all documents which don't have 'forms.name_of_form.status' equal 'void'.

预期结果是(文档没有无效的表单状态):

Expected result is (document without voided form status):

{
    "_id" : "Tvq579754r",
    "name": "Tom",
    "forms": {
           "PreOp":{
             "status":"closed"          
           },

           "Alert":{
             "status":"closed"          
           },

           "City":{
              "status":"closed"         
           },

          "Country":{
             "status":"closed"          
          } 
    }
}

推荐答案

在不事先知道所有可能的forms名称并在查询中使用它们的情况下,无法为所需的结果查询此结构.无论如何,这将是非常混乱的.就是说,请在我解释如何完成时继续阅读.

Querying this structure for the results you want is not possible without knowing all of the possible forms names beforehand, and using them in the query. It would be very messy at any rate. That said, read on as I explain how it can be done.

这些文档的结构存在问题,这将阻止您进行任何合理的查询分析.就目前而言,您必须知道所有可能的表单名称字段,才能过滤掉任何内容.

There is a problem with the structure of these documents that is going to prevent you doing any reasonable query analysis. As it stands you would have to know all the possible form name fields in order to filter out anything.

您当前的结构具有包含子文档的表格,其中每个键都包含具有单个属性status的另一个子文档.这很难遍历,因为forms元素对于您创建的每个文档都具有任意结构.这意味着下降到要比较集合中每个文档更改的status信息的模式.

Your current structure has forms containing a sub-document, of which each key contains another sub-document with a single property, status. This is difficult to traverse as your forms element has an arbitrary structure for each document you create. That means the pattern to descend to the status information you want to compare changes for every document in your collection.

这就是我所说的路径.要获得任何元素的状态,您必须执行以下操作

Here is what I mean by path. To get at status in any element you have to do the following

表格-> 预操作->状态

表格-> 警报->状态

第二个元素一直在变化.由于这样的命名被认为是明确的,因此无法无法使用通配符.

With the second element changing all the time. There is no way to wildcard something like this as the naming is considered explicit.

这可能已经被认为是一种实现从 forms 序列化数据的简便方法,但是我看到了更灵活的替代方案.您需要的是可以按标准模式遍历的文档结构.这始终是设计中值得考虑的事情.请执行以下操作:

This may have been considered an easy way to implement serializing the data from your forms but I see a more flexible alternative. What you need is a document structure you can traverse in a standard pattern. This is always something worth considering in design. Take the following:

{
    "_id" : "Tvq444454j",
    "name": "Jim",
    "forms": [
        {
             "name": "Jorney",
             "status":"closed"          
        },
        {
            "name": "Women",
            "status":"void"            
        },
        {
            "name": "Child",
            "status":"closed"           
        },
        {
            "name": "Farm",
            "status":"closed"            
        }  
    ]
}

因此更改了文档的结构,以使forms元素成为数组,而不是将状态字段放在名为表单字段"的键下,我们将数组的每个成员都作为子文档包含表单字段" namestatus.因此,标识符和状态都仍然配对在一起,但现在仅表示为子文档.最重要的是,这更改了对这些键的访问路径,就像现在两者一样,我们都可以做到

So the structure of the document is changed to make the forms element an Array, and rather than place the status field under a key that names the "form field" we have each member of the Array as a sub-document cotaining the "form field" name and the status. So both the identifier and the status are still paired together but just represented as a sub-document now. This most importantly changes the access path to these keys, as now for both the field name and it's status we can do

表格-> 状态

表格-> 名称

的意思是,您可以查询以查找form中所有字段的名称或form中所有status字段的名称,甚至可以找到所有带有某个name字段和某个status.这比原始结构可以做的要好得多.

What this means is that you can query to find the names of all the fields in the form or all the status fields in the form, or even all the documents with a certain name field and certain status. That is much better than what could be done with the original structure.

现在在您的特定情况下,您只想所有其中所有不是void字段的文档.现在,无法在单个查询中执行此操作,因为没有运算符以这种方式比较数组中的所有元素并查看它们是否相同.但是您可以采用两种方法:

Now in your particular case, you want to get only the documents where all the fields are not void. Now there is no way in a single query to do this as there is no operator to compare all the elements in a an array in this way and see if they are the same. But there are two approaches you can take:

第一个,可能不是最有效的方法是查询所有文档,这些文档包含forms中的元素,该元素的status为"void".使用生成的文档ID,您可以发出另一个查询,该查询返回具有指定ID的文档.

The first and probably not as efficient is to query for all documents that contain an element in forms that has a status of "void". With the resulting document Id's you can issue another query that returns the documents that do not have the Id's that were specified.

db.forms.find({ "forms.status": "void" },{ _id: 1})

db.forms.find({ _id: $not: { $in: [<Object1>,<Object2>,<Object3>,... ] } })

鉴于结果的大小,这可能是不可能的,并且通常不是一个好主意,因为排除运算符$not基本上是强制对该集合进行全扫描,因此您不能使用索引

Given result size this may not be possible and is generally not a good idea as the exclusion operator $not is basically forcing a full scan of the collection, so you couldn't use an index.

另一种方法是使用聚合管道,如下所示:

Another approach is to use the aggregation pipeline as follows:

db.forms.aggregate([
    { "$unwind": "$forms" },
    { "$group": { "_id": "$_id", "status": { "$addToSet": "$forms.status" }}},
    { "$unwind": "$status" },
    { "$sort": { "_id": 1, "status": -1 }},
    { "$group": { "_id": "$_id", "status": { "$first": "$status"}}},
    { "$match":{ "status": "closed" }}
])

当然,这只会返回匹配文档的_id,但是您可以使用$ in发出查询并返回整个匹配的文档.这比以前使用的排除运算符要好,现在我们可以使用索引来避免完整集合扫描.

Of course that will only return the _id for the documents that match, but you can issue a query with $in and return the whole matching documents. This is better than the exclusion operator used before, and now we can use an index to avoid full collection scans.

作为最终方法,出于最佳性能的考虑,您可以再次更改文档,以便在顶层保留表单中是否有任何字段的状态"为无效"或关闭".因此,在顶层,只有当所有项目都被关闭"并且"void"(如果有东西是空的)时,该值才会被关闭,等等.

As a final approach and for the best performance consideration, you could change the document again so that at the top level you keep the "status" of whether any field in the forms in "void" or "closed". So at the top level the value would be closed only if all the items were "closed" and "void" if something were void, and so on.

最后一个将意味着进一步的程序更改,并且对forms项的所有更改也都需要更新此字段,以保持状态".但是,这是查找所需文档的最有效方法,可能值得考虑.

That final one would mean a further programmatic change, and all changes to the forms items would need to update this field as well to maintain the "status". It is however the most efficient way of finding the documents you need and may be worth consideration.

编辑:

除了将文档更改为主要状态外,修订后的结构上最快的查询表实际上是:

Aside from changing the document to have a master status, fastest query form on the revised structure is actually:

db.forms.find({ "forms": { "$not": { "$elemMatch": { "status": "void" } } } })

这篇关于mongodb聚合框架按嵌套文档匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆