mongodb聚合框架匹配嵌套文档 [英] mongodb aggregation framework match by nested documents

查看:18
本文介绍了mongodb聚合框架匹配嵌套文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下文件清单:

{
    "_id" : "Tvq579754r",
    "name": "Tom",
    "forms": {
           "PreOp":{
             "status":"closed"          
           },

           "Alert":{
             "status":"closed"          
           },

           "City":{
              "status":"closed"         
           },

          "Country":{
             "status":"closed"          
          } 
    }
},
....
{
    "_id" : "Tvq444454j",
    "name": "Jim",
    "forms": {
          "Jorney":{
             "status":"closed"          
           },

          "Women":{
             "status":"void"            
          },

         "Child":{
            "status":"closed"           
         },

         "Farm":{
           "status":"closed"            
         }  
     }
}

我想通过它们的状态"字段(forms.name_of_form.status")过滤它们.我需要获取所有不具有 'forms.name_of_form.status' 等于 'void' 的文档.

I want to filter them by their 'status' field('forms.name_of_form.status'). I need fetch all documents which don't have 'forms.name_of_form.status' equal 'void'.

预期结果是(没有无效表单状态的文档):

Expected result is (document without voided form status):

{
    "_id" : "Tvq579754r",
    "name": "Tom",
    "forms": {
           "PreOp":{
             "status":"closed"          
           },

           "Alert":{
             "status":"closed"          
           },

           "City":{
              "status":"closed"         
           },

          "Country":{
             "status":"closed"          
          } 
    }
}

推荐答案

如果事先不知道所有可能的 forms 名称并在询问.无论如何,这将是非常混乱的.话虽如此,请继续阅读,我将解释它是如何完成的.

Querying this structure for the results you want is not possible without knowing all of the possible forms names beforehand, and using them in the query. It would be very messy at any rate. That said, read on as I explain how it can be done.

这些文档的结构存在问题,会妨碍您进行任何合理的查询分析.就目前而言,您必须知道所有可能的表单名称字段才能过滤掉任何内容.

There is a problem with the structure of these documents that is going to prevent you doing any reasonable query analysis. As it stands you would have to know all the possible form name fields in order to filter out anything.

您当前的结构具有包含子文档的表单,其中每个键都包含另一个具有单个属性 status 的子文档.这很难遍历,因为您的 forms 元素对于您创建的每个文档都有任意结构.这意味着模式下降status信息你想比较你的集合中每个文档的变化.

Your current structure has forms containing a sub-document, of which each key contains another sub-document with a single property, status. This is difficult to traverse as your forms element has an arbitrary structure for each document you create. That means the pattern to descend to the status information you want to compare changes for every document in your collection.

这就是我所说的路径.要获得任何元素的状态,您必须执行以下操作

Here is what I mean by path. To get at status in any element you have to do the following

表格->PreOp ->状态

Forms -> PreOp -> status

表格->警报 ->状态

Forms -> Alert -> status

第二个元素一直在变化.没有办法通配符这样的东西,因为命名被认为是明确的.

With the second element changing all the time. There is no way to wildcard something like this as the naming is considered explicit.

这可能被认为是一种从您的表单序列化数据的简单方法,但我看到了一种更灵活的替代方法.您需要的是可以以标准模式遍历的文档结构.这始终是设计中值得考虑的事情.采取以下措施:

This may have been considered an easy way to implement serializing the data from your forms but I see a more flexible alternative. What you need is a document structure you can traverse in a standard pattern. This is always something worth considering in design. Take the following:

{
    "_id" : "Tvq444454j",
    "name": "Jim",
    "forms": [
        {
             "name": "Jorney",
             "status":"closed"          
        },
        {
            "name": "Women",
            "status":"void"            
        },
        {
            "name": "Child",
            "status":"closed"           
        },
        {
            "name": "Farm",
            "status":"closed"            
        }  
    ]
}

因此文档的结构被更改为使 forms 元素成为一个数组,而不是将状态字段放在命名表单字段"的键下.我们将数组的每个成员作为包含表单字段"的子文档.namestatus.所以标识符和状态仍然配对在一起,但现在只是表示为一个子文档.这最重要的是改变了这些键的访问路径,就像现在对于 both 字段名称和我们可以做的状态

So the structure of the document is changed to make the forms element an Array, and rather than place the status field under a key that names the "form field" we have each member of the Array as a sub-document cotaining the "form field" name and the status. So both the identifier and the status are still paired together but just represented as a sub-document now. This most importantly changes the access path to these keys, as now for both the field name and it's status we can do

表格->状态

表格->姓名

this的意思是可以查询到form中所有字段的名称或者status中所有字段的名称form,甚至是所有具有特定name字段和特定status的文档.这比使用原始结构可以实现的效果好多.

What this means is that you can query to find the names of all the fields in the form or all the status fields in the form, or even all the documents with a certain name field and certain status. That is much better than what could be done with the original structure.

现在,在您的特定情况下,您希望获取所有字段不是void的文档.现在没有办法在单个查询中执行此操作,因为没有运算符可以以这种方式比较数组中的所有元素并查看它们是否相同.但是您可以采取两种方法:

Now in your particular case, you want to get only the documents where all the fields are not void. Now there is no way in a single query to do this as there is no operator to compare all the elements in a an array in this way and see if they are the same. But there are two approaches you can take:

第一个可能效率不高的方法是查询 所有 文档,这些文档包含 formsstatus 为 " 的元素.作废".使用生成的文档 ID,您可以发出另一个查询,返回具有指定 ID 的文档.

The first and probably not as efficient is to query for all documents that contain an element in forms that has a status of "void". With the resulting document Id's you can issue another query that returns the documents that do not have the Id's that were specified.

db.forms.find({ "forms.status": "void" },{ _id: 1})

db.forms.find({ _id: $not: { $in: [<Object1>,<Object2>,<Object3>,... ] } })

考虑到结果大小,这可能是不可能的,并且通常不是一个好主意,因为排除运算符 $not 基本上是强制对集合进行全扫描,所以你无法使用索引.

Given result size this may not be possible and is generally not a good idea as the exclusion operator $not is basically forcing a full scan of the collection, so you couldn't use an index.

另一种方法是使用聚合管道,如下所示:

Another approach is to use the aggregation pipeline as follows:

db.forms.aggregate([
    { "$unwind": "$forms" },
    { "$group": { "_id": "$_id", "status": { "$addToSet": "$forms.status" }}},
    { "$unwind": "$status" },
    { "$sort": { "_id": 1, "status": -1 }},
    { "$group": { "_id": "$_id", "status": { "$first": "$status"}}},
    { "$match":{ "status": "closed" }}
])

当然,这只会返回匹配文档的 _id,但您可以使用 $in 发出查询并返回整个匹配文档.这比以前使用的排除运算符要好,现在我们可以使用索引来避免全集合扫描.

Of course that will only return the _id for the documents that match, but you can issue a query with $in and return the whole matching documents. This is better than the exclusion operator used before, and now we can use an index to avoid full collection scans.

作为最后一种方法,出于最佳性能考虑,您可以再次更改文档,以便在顶层保持状态".无效"中的表格中是否有任何字段?或关闭".所以在顶层,只有当所有项目都关闭"时,价值才会被关闭.和无效"如果某些东西是无效的,等等.

As a final approach and for the best performance consideration, you could change the document again so that at the top level you keep the "status" of whether any field in the forms in "void" or "closed". So at the top level the value would be closed only if all the items were "closed" and "void" if something were void, and so on.

最后一个将意味着进一步的程序更改,并且对 forms 项的所有更改也需要更新此字段以保持状态".然而,这是查找所需文件的最有效方式,可能值得考虑.

That final one would mean a further programmatic change, and all changes to the forms items would need to update this field as well to maintain the "status". It is however the most efficient way of finding the documents you need and may be worth consideration.

编辑:

除了把文档改成master状态之外,修改后的结构上最快的查询形式其实是:

Aside from changing the document to have a master status, fastest query form on the revised structure is actually:

db.forms.find({ "forms": { "$not": { "$elemMatch": { "status": "void" } } } })

这篇关于mongodb聚合框架匹配嵌套文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆