按“最后一个数组条目"字段值过滤结果 [英] Filter results by the Last Array Entry Field Value

查看:47
本文介绍了按“最后一个数组条目"字段值过滤结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

具有此文档结构(为简洁起见,省略了不相关的字段):

[
    {
        "_id" : 0,
        "partn" : [ 
            {
                "date" : ISODate("2015-07-28T00:59:14.963Z"),
                "is_partner" : true
            }, 
            {
                "date" : ISODate("2015-07-28T01:00:32.771Z"),
                "is_partner" : false
            }, 
            {
                "date" : ISODate("2015-07-28T01:15:29.916Z"),
                "is_partner" : true
            }, 
            {
                "date" : ISODate("2015-08-05T13:48:07.035Z"),
                "is_partner" : false
            }, 
            {
                "date" : ISODate("2015-08-05T13:50:56.482Z"),
                "is_partner" : true
            }
        ]
    },
    {
        "_id" : 149,
        "partn" : [ 
            {
                "date" : ISODate("2015-07-30T12:42:18.894Z"),
                "is_partner" : true
            }, 
            {
                "date" : ISODate("2015-07-31T00:01:51.176Z"),
                "is_partner" : false
            }
        ]
    }
]

我需要过滤最后一个(最近的)partn.is_partnertrue的文档,这是最好的方法吗?

db.somedb
    .aggregate([ 
        // pre-filter only the docs with at least one is_partner === true, is it efficient/needed?
        {$match: {partn: { $elemMatch: { is_partner: true } } } },
        {$unwind: '$partn'},
        // do I need to sort by _id too, here?
        {$sort: {_id: 1, 'partn.date': 1} },
        // then group back fetching the last one by _id
        {$group : {
           _id : '$_id',
           partn: {$last: '$partn'},
        }},
        // and return only those with is_partner === true
        {$match: {'partn.is_partner': true } },
    ])

我得到了我所需要的东西,但是,作为一个不是那么经验丰富的mongodb开发人员,感觉就像是该聚合的开销.我本来只是想获取每个.partn数组上的最后一个条目,但是有时必须导出/导入该集合,如果我没记错的话,可以更改排序顺序-因此按日期进行聚合和排序可能会使该方面失效. /p>

这是最好(最有效)的方法吗?如果没有,为什么?

谢谢. (顺便说一下,这是MongoDB 2.6)

解决方案

在此方面的里程数可能会有所不同,并且很可能证明您正在遵循"的流程至少是最适合的".但是我们可能可以做得更有效率.

您现在可以做什么

通过使用 $sort 修饰符,并带有 $push ,那么您可以执行以下操作:

 db.somedb.find(
  { 
    "partn.is_partner": true,
    "$where": function() {
      return this.partn.slice(-1)[0].is_partner == true;
    }
  },
  { "partn": { "$slice": -1 } }
)
 

只要partn,is_partner被索引",这仍然非常有效,因为可以使用索引来满足初始查询条件.不能使用的部分是 $where 此处使用JavaScript评估的子句.

但是$where的第二部分所做的只是简单地切片"数组中的最后一个元素,并测试is_partner属性的值以查看其是否为真.仅在满足该条件的情况下,才会返回文档.

还有 $slice 投影运算符.在返回数组中的最后一个元素时,这做同样的事情.错误的匹配已经被过滤掉了,所以这仅显示了最后一个为true的元素.

结合上面提到的索引,鉴于已经选择了文档,并且JavaScript条件仅过滤了其余文档,因此应该很快.请注意,如果没有其他字段符合标准查询条件,则$where子句不能使用索引.因此,始终尝试将查询"与其他查询条件一起使用.

您将来可以做什么

Next Up,虽然在撰写本文时不可用,但是肯定会在不久的将来成为聚合框架的 $slice 运算符.当前位于开发分支中,但下面是它的工作原理:

 db.somedb.aggregate([
  { "$match": { "partn.is_partner": true } },
  { "$redact": {
    "$cond": {
      "if": { 
        "$anyElementTrue": {
          "$map": {
            "input": { "$slice": ["$partn",-1] },
            "as": "el",
            "in": "$$el.is_partner"
          }
        }
      },
      "then": "$$KEEP",
      "else": "$$PRUNE"
    }
  }},
  { "$project": {
      "partn": { "$slice": [ "$partn",-1 ] }
  }}
])
 

$redact 阶段允许使用逻辑条件过滤要测试的文档,从而对文档进行测试.在这种情况下,$slice会生成一个元素数组,该数组将发送到 $map ,以便仅提取单个is_partner值(仍作为数组).由于充其量还是单个元素数组,因此另一个测试是 $anyElementTrue ,这使它成为单个布尔结果,适用于 { "_id" : 0, "partn" : [ { "date" : ISODate("2015-07-28T00:59:14.963Z"), "is_partner" : true }, { "date" : ISODate("2015-07-28T01:00:32.771Z"), "is_partner" : false }, { "date" : ISODate("2015-07-28T01:15:29.916Z"), "is_partner" : true }, { "date" : ISODate("2015-08-05T13:48:07.035Z"), "is_partner" : false }, { "date" : ISODate("2015-08-05T13:50:56.482Z"), "is_partner" : true } ] }


两者的最大区别是您的数组必须已经排序,因此最新日期为第一.否则,您需要像现在一样使用聚合框架来$sort阵列.

效率不是很高,所以这就是为什么您应该对阵列进行预排序"并在每次更新时保持顺序.

作为一个方便的技巧,这实际上将在一个简单的语句中对所有集合文档中的所有数组元素重新排序:

 db.somedb.update(
    {},
    { "$push": { 
        "partn": { "$each": [], "$sort": { "date": 1 } }
    }},
    { "multi": true }
)
 

因此,即使您不只是将新元素推入"数组并仅更新属性,也可以始终应用该基本构造来使数组按您希望的顺序排列.

值得考虑,因为它可以使事情变得更快.

Having this document structure (omitting irrelevant fields for brevity):

[
    {
        "_id" : 0,
        "partn" : [ 
            {
                "date" : ISODate("2015-07-28T00:59:14.963Z"),
                "is_partner" : true
            }, 
            {
                "date" : ISODate("2015-07-28T01:00:32.771Z"),
                "is_partner" : false
            }, 
            {
                "date" : ISODate("2015-07-28T01:15:29.916Z"),
                "is_partner" : true
            }, 
            {
                "date" : ISODate("2015-08-05T13:48:07.035Z"),
                "is_partner" : false
            }, 
            {
                "date" : ISODate("2015-08-05T13:50:56.482Z"),
                "is_partner" : true
            }
        ]
    },
    {
        "_id" : 149,
        "partn" : [ 
            {
                "date" : ISODate("2015-07-30T12:42:18.894Z"),
                "is_partner" : true
            }, 
            {
                "date" : ISODate("2015-07-31T00:01:51.176Z"),
                "is_partner" : false
            }
        ]
    }
]

I need to filter documents where the last (most recent) partn.is_partner is true, is this the best way to do it?

db.somedb
    .aggregate([ 
        // pre-filter only the docs with at least one is_partner === true, is it efficient/needed?
        {$match: {partn: { $elemMatch: { is_partner: true } } } },
        {$unwind: '$partn'},
        // do I need to sort by _id too, here?
        {$sort: {_id: 1, 'partn.date': 1} },
        // then group back fetching the last one by _id
        {$group : {
           _id : '$_id',
           partn: {$last: '$partn'},
        }},
        // and return only those with is_partner === true
        {$match: {'partn.is_partner': true } },
    ])

I get what I need but, being a not-that-experient mongodb developer something feels like overhead in that aggregation. I thought about just fetching the last entry on each .partn array, but the collection must be exported/imported sometimes, if I remember right the sort order can be changed - so aggregating and sorting by date could fail-proof that aspect.

Is this the best (most efficient) way to do it? If not, why?

Thanks. (Btw, this is MongoDB 2.6)

解决方案

Mileage may vary on this and it may well turn out that "currently" the process you are following works out to be "most suited" at least. But we can probably do more efficient.

What you could do now

Provided your arrays are already "sorted" via using the $sort modifier with $push, then you can probably do this:

db.somedb.find(
  { 
    "partn.is_partner": true,
    "$where": function() {
      return this.partn.slice(-1)[0].is_partner == true;
    }
  },
  { "partn": { "$slice": -1 } }
)

So as long as partn,is_partner is "indexed" this is still pretty efficient as that initial query condition can be met using an index. The part that cannot is the $where clause here that uses JavaScript evaluation.

But what that second part in the $where is doing is simply "slicing" the last element from the array and testing it's value of the is_partner property to see if it is true. Only if that condition is also met is the document returned.

There is also the $slice projection operator. This does the same thing in returning the last element from the array. False matches are already filtered, so this is just showing only the last element where true.

Combined with with the index as mentioned, then this should be pretty quick given that the documents have been selected already and the JavaScript condition just filters the rest. Note that without another field with a standard query condition to match, a $where clause cannot use an index. So always try to use "sparingly" with other query conditions in place.

What you can do in the future

Next Up, while not available at the time of writing, but certainly in the near future will be the $slice operator for the aggregation framework. This is currently in the develpment branch, but here is a peek at how it works:

db.somedb.aggregate([
  { "$match": { "partn.is_partner": true } },
  { "$redact": {
    "$cond": {
      "if": { 
        "$anyElementTrue": {
          "$map": {
            "input": { "$slice": ["$partn",-1] },
            "as": "el",
            "in": "$$el.is_partner"
          }
        }
      },
      "then": "$$KEEP",
      "else": "$$PRUNE"
    }
  }},
  { "$project": {
      "partn": { "$slice": [ "$partn",-1 ] }
  }}
])

Combining that $slice within a $redact stage here alows the documents to be filtered with a logical condition, testing the document. In this case the $slice produces a single element array that is sent to $map in order to just extract the single is_partner value ( still as an array ). As this is still a single element array at best, the other test is $anyElementTrue which makes this a singular boolean result, suitable for $cond.

The $redact here decides on that result whether to $$KEEP or $$PRUNE the document from the results. Later we use $slice again in the project to just return the last element of the array after the filtering.

That works out to be pretty much exactly what the JavaScript version does, with the exception that this is using all native coded operators, and therefore should be a bit faster than the JavaScript alternate.

Both forms return your first document as expected:

{
    "_id" : 0,
    "partn" : [
            {
                    "date" : ISODate("2015-07-28T00:59:14.963Z"),
                    "is_partner" : true
            },
            {
                    "date" : ISODate("2015-07-28T01:00:32.771Z"),
                    "is_partner" : false
            },
            {
                    "date" : ISODate("2015-07-28T01:15:29.916Z"),
                    "is_partner" : true
            },
            {
                    "date" : ISODate("2015-08-05T13:48:07.035Z"),
                    "is_partner" : false
            },
            {
                    "date" : ISODate("2015-08-05T13:50:56.482Z"),
                    "is_partner" : true
            }
    ]
}


The big catch here with both is your array must already be sorted so the latest date is first. Without that, then you need the aggregation framework to $sort the array, just as you are doing now.

Not really efficient, so that is why you should "pre-sort" your array and maintain the order on each update.

As a handy trick, this will actually re-order all the array elements in all collection documents in one simple statement:

db.somedb.update(
    {},
    { "$push": { 
        "partn": { "$each": [], "$sort": { "date": 1 } }
    }},
    { "multi": true }
)

So even if you are not "pushing" a new element into an array and just updating a property, you can always apply that basic construct to keep the array ordered how you want it.

Worth considering as it should make things much faster.

这篇关于按“最后一个数组条目"字段值过滤结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆