有没有变通办法允许在Mongodb聚合管道中使用正则表达式 [英] Is there a workaround to allow using a regex in the Mongodb aggregation pipeline

查看:135
本文介绍了有没有变通办法允许在Mongodb聚合管道中使用正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个管道,该管道将计算符合某些条件的文档数量.我看不到在条件中使用正则表达式的任何方法.这是带有注释的管道的简化版本:

I'm trying to create a pipeline which will count how many documents match some conditions. I can't see any way to use a regular expression in the conditions though. Here's a simplified version of my pipeline with annotations:

db.Collection.aggregate([
    // Pipeline before the issue
    {'$group': {
        '_id': {
            'field': '$my_field', // Included for completeness
        },
        'first_count': {'$sum': {                    // We're going to count the number
            '$cond': [                               // of documents that have 'foo' in 
                {'$eq: ['$field_foo', 'foo']}, 1, 0  // $field_foo.
            ] 
        }},                                       

        'second_count': {'$sum': {                       // Here, I want to count the
            '$cond': [                                   // Number of documents where
                {'$regex': ['$field_bar', regex]}, 1, 0  // the value of 'bar' matches
            ]                                            // the regex 
        }},                                          
    },
    // Additional operations
])

我知道语法是错误的,但是我希望这能传达我正在尝试做的事情.有什么方法可以在$ cond操作中执行此匹配?或者,或者,我也愿意在管道中较早的位置进行匹配并将结果存储在文档中,这样我现在只需要在布尔值上进行匹配即可.

I know the syntax is wrong, but I hope this conveys what I'm trying to do. Is there any way to perform this match in the $cond operation? Or, alternatively, I'm also open to the possibility of doing the match somewhere earlier in the pipeline and storing the result in the documents so that I only have to match on a boolean at this point.

推荐答案

这个问题似乎无济于事. 我知道有两种可能的解决方案: 解决方案1-使用mapReduce. mapReduce是聚合的一般形式,可让用户执行任何可以想象和可编程的事情.

This question seems to come many times with no solution. There are two possible solutions that I know: solution 1- using mapReduce. mapReduce is the general form of aggregation that let user do anything imaginable and programmable.

以下是使用mapReduce的mongo shell解决方案 我们考虑以下"st"集合.

following is the mongo shell solution using mapReduce We consider the following 'st' collection.

db.st.find()

db.st.find()

{ "_id" : ObjectId("51d6d23b945770d6de5883f1"), "foo" : "foo1", "bar" : "bar1" }
{ "_id" : ObjectId("51d6d249945770d6de5883f2"), "foo" : "foo2", "bar" : "bar2" }
{ "_id" : ObjectId("51d6d25d945770d6de5883f3"), "foo" : "foo2", "bar" : "bar22" }
{ "_id" : ObjectId("51d6d28b945770d6de5883f4"), "foo" : "foo2", "bar" : "bar3" }
{ "_id" : ObjectId("51d6daf6945770d6de5883f5"), "foo" : "foo3", "bar" : "bar3" }
{ "_id" : ObjectId("51d6db03945770d6de5883f6"), "foo" : "foo4", "bar" : "bar24" }

我们要按foo分组,并为每个foo计算doc的数量,以及带有包含子字符串'bar2'的bar的doc的数量.

we want to group by foo, and for each foo, count the number of doc, as well as the number of doc with bar containing the substring 'bar2'.that is:

foo1: nbdoc=1, n_match = 0
foo2: nbdoc=3, n_match = 2
foo3: nbdoc=1, n_match = 0
foo4: nbdoc=1, n_match = 1

为此,请定义以下地图函数

To do that, define the following map function

var mapFunction = function() {
  var key = this.foo;
  var nb_match_bar2 = 0;
  if( this.bar.match(/bar2/g) ){
    nb_match_bar2 = 1;
  }
  var value = {
    count: 1,
    nb_match: nb_match_bar2
  };

  emit( key, value );
};

以及以下的reduce函数

and the following reduce function

var reduceFunction = function(key, values) {

  var reducedObject = {
    count: 0,
    nb_match:0
  };
  values.forEach( function(value) {
    reducedObject.count += value.count;
    reducedObject.nb_match += value.nb_match;
  }
  );
  return reducedObject;
};

运行mapduce并将结果存储在集合map_reduce_result

run mapduce and store the result in the collection map_reduce_result

db.st.mapReduce(mapFunction, reduceFunction, {out:'map_reduce_result'})
{
  "result" : "map_reduce_result",
  "timeMillis" : 7,
  "counts" : {
    "input" : 6,
    "emit" : 6,
    "reduce" : 1,
    "output" : 4
},
"ok" : 1,
}

最后,我们可以查询集合map_reduce_result,瞧!解决方案

Finally, we can query the collection map_reduce_result, voila! the solution

> db.map_reduce_result.find()
{ "_id" : "foo1", "value" : { "count" : 1, "nb_match" : 0 } }
{ "_id" : "foo2", "value" : { "count" : 3, "nb_match" : 2 } }
{ "_id" : "foo3", "value" : { "count" : 1, "nb_match" : 0 } }
{ "_id" : "foo4", "value" : { "count" : 1, "nb_match" : 1 } }

解决方案2-使用两个单独的聚合并合并 我不会提供此解决方案的详细信息,因为任何mongo用户都可以轻松实现. 步骤1:进行汇总,忽略需要正则表达式求和的部分. 步骤2:对与步骤1相同的键进行第二次聚合分组. 管道的阶段1:匹配正则表达式; 第2阶段:使用与第一步相同的密钥进行分组,并计算每组{$ sum:1}中的文档数; 第3步:合并第1步和第2步的结果:对于出现在两个结果中的每个键,请添加新字段,如果第二个结果中不存在该键,请将新键设置为0.

solution 2- using two separate aggregations and merge I won't give details for this solution as any mongo user can easily do it. step 1: do the aggregation, ignoring the part that requires regex to sum. step 2: do a second aggregation grouping on the same key as the one of step one. stage 1 of the pipeline: match the regular expression; stage 2: group on the same key as in the first step and count the number of doc in each group {$sum: 1}; step 3: merge the result of step 1 and 2: for each key that appears in both result add the new field, if the key does is not present in the second result set the new key to 0.

Voila!另一个解决方案.

Voila! another solution.

这篇关于有没有变通办法允许在Mongodb聚合管道中使用正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆