如何在$ or中使用$ regex作为聚合表达式 [英] How to use $regex inside $or as an Aggregation Expression

查看:41
本文介绍了如何在$ or中使用$ regex作为聚合表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个查询,允许用户使用类似于以下格式的字符串对某些字符串字段进行过滤:最新检查的描述为以下任何内容:foobar" .这对以下查询非常有用:

I have a query which allows the user to filter by some string field using a format that looks like: "Where description of the latest inspection is any of: foo or bar". This works great with the following query:

db.getCollection('permits').find({
  '$expr': {
    '$let': {
      vars: {
        latestInspection: {
          '$arrayElemAt': ['$inspections', {
            '$indexOfArray': ['$inspections.inspectionDate', {
              '$max': '$inspections.inspectionDate'
            }]
          }]
        }
      },
      in: {
        '$in': ['$$latestInspection.description', ['Fire inspection on property', 'Health inspection']]
      }
    }
  }
})

我想要的是使用户能够使用通配符,并将其转换为正则表达式:最新检查的描述为以下任何内容:Health inspectionFound a * at the property" .

What I want is for the user to be able to use wildcards which I turn into regular expressions: "Where description of the latest inspection is any of: Health inspection or Found a * at the property".

我得到的正则表达式,不需要帮助.我面临的问题显然是聚合$in运算符没有支持使用正则表达式进行匹配.因此,我认为我会使用$or构建此文件,因为文档不要表示我不能使用正则表达式.这是我的最佳尝试:

The regex I get, don't need help with that. The problem I'm facing is, apparently the aggregation $in operator does not support matching by regular expressions. So I thought I'd build this using $or since the docs don't say I can't use regex. This was my best attempt:

db.getCollection('permits').find({
  '$expr': {
    '$let': {
      vars: {
        latestInspection: {
          '$arrayElemAt': ['$inspections', {
            '$indexOfArray': ['$inspections.inspectionDate', {
              '$max': '$inspections.inspectionDate'
            }]
          }]
        }
      },
      in: {
        '$or': [{
          '$$latestInspection.description': {
            '$regex': /^Found a .* at the property$/
          }
        }, {
          '$$latestInspection.description': 'Health inspection'
        }]
      }
    }
  }
})

除了出现错误:

"Unrecognized expression '$$latestInspection.description'"

我想我不能使用$$latestInspection.description作为对象键,但是我不确定(我的知识在这里有限),我也找不到其他方法来做自己想要的事情.因此,您看到我什至无法走远,看看是否可以在$or中使用$regex.我感谢能获得的所有帮助.

I'm thinking I can't use $$latestInspection.description as an object key but I'm not sure (my knowledge here is limited) and I can't figure out another way to do what I want. So you see I wasn't even able to get far enough to see if I can use $regex in $or. I appreciate all the help I can get.

推荐答案

,而是 $where 代替:

Everything inside $expr is an aggregation expression, and the documentation may not "say you cannot explicitly", but the lack of any named operator and the JIRA issue SERVER-11947 certainly say that. So if you need a regular expression then you really have no other option than using $where instead:

db.getCollection('permits').find({
  "$where": function() {
    var description = this.inspections
       .sort((a,b) => b.inspectionDate.valueOf() - a.inspectionDate.valueOf())
       .shift().description;

     return /^Found a .* at the property$/.test(description) ||
           description === "Health Inspection";

  }
})

您仍然可以使用 $expr 和聚合表达式进行完全匹配,或者只是将比较结果保留在 $where 无论如何.但是目前,MongoDB只能理解的正则表达式是 $regex 查询"表达式中.

You can still use $expr and aggregation expressions for an exact match, or just keep the comparison within the $where anyway. But at this time the only regular expressions MongoDB understands is $regex within a "query" expression.

如果您确实需要" ,那么该聚合管道表达式会阻止您使用

If you did actually "require" an aggregation pipeline expression that precludes you from using $where, then the only current valid approach is to first "project" the field separately from the array and then $match with the regular query expression:

db.getCollection('permits').aggregate([
  { "$addFields": {
     "lastDescription": {
       "$arrayElemAt": [
         "$inspections.description",
         { "$indexOfArray": [
           "$inspections.inspectionDate",
           { "$max": "$inspections.inspectionDate" }
         ]}
       ]
     }
  }},
  { "$match": {
    "lastDescription": {
      "$in": [/^Found a .* at the property$/,/Health Inspection/]
    }
  }}
])

这导致我们发现您似乎正在寻找具有最大日期值的数组中的项目. JavaScript语法应明确指出,此处的正确方法是> c16> 更新"上的数组.这样,数组中的第一"项可以是最新".这是您可以通过常规查询执行的操作.

Which leads us to the fact that you appear to be looking for the item in the array with the maximum date value. The JavaScript syntax should be making it clear that the correct approach here is instead to $sort the array on "update". In that way the "first" item in the array can be the "latest". And this is something you can do with a regular query.

要维持订单,请确保使用 $push $sort 像这样:

To maintain the order, ensure new items are added to the array with $push and $sort like this:

db.getCollection('permits').updateOne(
  { "_id": _idOfDocument },
  {
    "$push": {
      "inspections": {
        "$each": [{ /* Detail of inspection object */ }],
        "$sort": { "inspectionDate": -1 }
      }
    }
  }
)

实际上, $each updateMany() 将更新所有现有文档:

In fact with an empty array argument to $each an updateMany() will update all your existing documents:

db.getCollection('permits').updateMany(
  { },
  {
    "$push": {
      "inspections": {
        "$each": [],
        "$sort": { "inspectionDate": -1 }
      }
    }
  }
)

仅当您实际上更改"更新过程中存储的日期时才真正需要这些,并且最好使用

These really only should be necessary when you in fact "alter" the date stored during updates, and those updates are best issued with bulkWrite() to effectively do "both" the update and the "sort" of the array:

db.getCollection('permits').bulkWrite([
  { "updateOne": {
    "filter": { "_id": _idOfDocument, "inspections._id": indentifierForArrayElement },
    "update": {
      "$set": { "inspections.$.inspectionDate": new Date() }
    }
  }},
  { "updateOne": {
    "filter": { "_id": _idOfDocument },
    "update": {
      "$push": { "inspections": { "$each": [], "$sort": { "inspectionDate": -1 } } }
    }
  }}
])

但是,如果您从未真正更改过"日期,那么使用 $sort :

However if you did not ever actually "alter" the date, then it probably makes more sense to simply use the $position modifier and "pre-pend" to the array instead of "appending", and avoiding any overhead of a $sort:

db.getCollection('permits').updateOne(
  { "_id": _idOfDocument },
  { 
    "$push": { 
      "inspections": {
        "$each": [{ /* Detail of inspection object */ }],
        "$position": 0
      }
    }
  }
)

随着数组被永久排序或至少被构造,因此最新"日期实际上始终是第一"条目,那么您可以简单地使用常规查询表达式:

With the array permanently sorted or at least constructed so the "latest" date is actually always the "first" entry, then you can simply use a regular query expression:

db.getCollection('permits').find({
  "inspections.0.description": { 
    "$in": [/^Found a .* at the property$/,/Health Inspection/]
  }
})

因此,这里的课程是不要尝试将计算的表达式强加到您真正不需要的逻辑上.没有充分的理由说明为什么不能将数组内容排序为存储",以使最新日期为第一" ,即使您认为自己需要使用该数组也是如此.任何其他命令,那么您可能应该权衡哪种使用情况更重要.

So the lesson here is don't try and force calculated expressions upon your logic where you really don't need to. There should be no compelling reason why you cannot order the array content as "stored" to have the "latest date first", and even if you thought you needed the array in any other order then you probably should weigh up which usage case is more important.

一旦重新定义,只要将正则表达式锚定在字符串的开头或查询表达式中至少有其他字符完全匹配,您甚至可以在某种程度上利用索引.

Once reodered you can even take advantage of an index to some extent as long as the regular expressions are either anchored to the beginning of string or at least something else in the query expression does an exact match.

如果您真的无法对数组重新排序,请在JIRA问题解决之前,$where 查询是您唯一的选择.希望实际上是针对当前目标的4.1版本,但最好的估计是6个月到一年.

In the event you feel you really cannot reorder the array, then the $where query is your only present option until the JIRA issue resolves. Which is hopefully actually for the 4.1 release as currently targeted, but that is more than likely 6 months to a year at best estimate.

这篇关于如何在$ or中使用$ regex作为聚合表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆