基于记录中字段数的Mongodb查询 [英] Mongodb Query based on number of fields in a record

查看:104
本文介绍了基于记录中字段数的Mongodb查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对谷歌搜索的回答不是很好.

I haven't been very good at Googling for this answer.

我在每个记录中可能有大约115个不同的字段.集合是在庞大的数据集上进行mapreduce的输出.

I have around 115 different fields that might be in each record. Collection is the output of a mapreduce on an amazingly large dataset.

看起来像这样:

{_id:'number1', value:{'a':1, 'b':2, 'f':5}},
{_id:'number2', value:{'e':2, 'f':114, 'h':12}},
{_id:'number3', value:{'i':2, 'j':22, 'z':12, 'za':111, 'zb':114}}

关于如何查找包含5个字段的记录的任何想法吗?

Any ideas of how I might find records with 5 fields populated?

推荐答案

运行它仍然不是一个很好的查询,但是通过$objectToArray$redact

It's still not a nice query to run, but there is a slightly more modern way to do it via $objectToArray and $redact

db.collection.aggregate([
  { "$redact": {
    "$cond": {
      "if": {
        "$eq": [
          { "$size": { "$objectToArray": "$value" } },
          3
        ]
      },
      "then": "$$KEEP",
      "else": "$$PRUNE"
    }
  }}
])

$objectToArray基本上将对象强制转换为数组形式,非常类似于JavaScript中Object.keys().map()的组合.

Where $objectToArray basically coerces the object into an array form, much like a combination of Object.keys() and .map() would in JavaScript.

这仍然不是一个好主意,因为它确实需要扫描整个集合,但是至少聚合框架操作使用本机代码",而不是像使用$where的JavaScript解释那样.

It's still not a fantastic idea since it does require scanning the whole collection, but at least the aggregation framework operations use "native code" as opposed to JavaScript interpretation as is the case using $where.

因此,通常建议更改数据结构并使用自然数组以及可能的情况下存储的"size"属性,以便进行最有效的查询操作.

So it's still generally advisable to change data structure and use a natural array as well as stored "size" properties where possible in order to make the most effective query operations.

是的,可以这样做,但不是最好的方法.原因是您实际上使用的是 $where 运算符查询,该查询使用JavaScript评估来匹配内容.这不是最有效的方法,因为它永远无法使用索引,并且需要测试所有文档:

Yes it is possible to do but not in the nicest way. The reason for this is that you are essentially using a $where operator query which uses JavaScript evaluation to match the contents. Not the most efficient way as this can never use an index and needs to test all the documents:

db.collection.find({ "$where": "return Object.keys(this.value).length == 3" })

这将查找与三个"元素匹配的条件,然后仅返回列出的两个文档:

This looks for the condition matching "three" elements, then only two of your listed documents would be returned:

{ "_id" : "number1", "value" : { "a" : 1, "b" : 2, "f" : 5 } }
{ "_id" : "number2", "value" : { "e" : 2, "f" : 114, "h" : 12 } }

或者对于五个"或更多字段,您可以执行相同的操作:

Or for "five" fields or more you can do much the same:

db.numbers.find({ "$where": "return Object.keys(this.value).length >= 5" })

因此,该运算符的参数实际上是在服务器上评估为返回true所在位置的JavaScript语句.

So the arguments to that operator are effectively JavaScript statements that are evaluated on the server to return where true.

一种更有效的方法是将元素的计数"存储在文档本身中.这样,您可以为该字段建立索引",查询效率更高,因为不需要扫描其他条件选择的集合中的每个文档来确定长度:

A more efficient way is to store the "count" of the elements in the document itself. In this way you can "index" this field and the queries are much more efficient as each document in the collection selected by other conditions does not need to be scanned to determine the length:

{_id:'number1', value:{'a':1, 'b':2, 'f':5} count: 3},
{_id:'number2', value:{'e':2, 'f':114, 'h':12}, count: 3},
{_id:'number3', value:{'i':2, 'j':22, 'z':12, 'za':111, 'zb':114}, count: 5}

然后要获得包含五个"元素的文档,您只需要简单的查询:

Then to get the documents with "five" elements you only need the simple query:

db.collection.find({ "count": 5 })

通常是最佳形式.但是还有一点是,您可能会从一般实践中满意的一般对象"结构并不是MongoDB通常能很好地发挥"的东西.问题是对象中元素的遍历",这样,当您使用数组"时,MongoDB会更快乐.甚至以这种形式:

That is generally the most optimal form. But another point is that the general "Object" structure that you might be happy with from general practice is not something that MongoDB "plays well" with in general. The problem is "traversal" of elements in the object, and in this way MongoDB is much happier when you use an "array". And even in this form:

{
    '_id': 'number1', 
    'values':[
        { 'key': 'a', 'value': 1 },
        { 'key': 'b', 'value': 2 }, 
        { 'key': 'f', 'value': 5 }
    ],
},
{
    '_id': 'number2', 
    'values':[
        { 'key': 'e', 'value': 2 }, 
        { 'key': 'f', 'value': 114 }, 
        { 'key': 'h', 'value': 12 }
    ],
},
{
    '_id':'number3', 
    'values': [
        { 'key': 'i', 'values': 2 }, 
        { 'key': 'j', 'values': 22 }, 
        { 'key': 'z'' 'values': :12 }, 
        { 'key': 'za', 'values': 111 },
        { 'key': 'zb', 'values': 114 }
    ]
}

因此,如果您实际上切换到这样的数组"格式,则可以使用一个版本的 $size 运算符:

So if you actually switch to an "array" format like that then you can do an exact length of an array with one version of the $size operator:

db.collection.find({ "values": { "$size": 5 } })

该运算符可以为数组长度的 exact 值工作,因为这是该运算符可以完成的工作的基本规定. 不平等"匹配中记录了您无法做的事情.为此,您需要用于MongoDB的聚合框架",它是JavaScript和mapReduce操作的更好替代方案:

That operator can work for an exact value for an array length as that is a basic provision of what can be done with this operator. What you cannot do as is documented in a "in-equality" match. For that you need the "aggregation framework" for MongoDB, which is a better alternate to JavaScript and mapReduce operations:

db.collection.aggregate([
    // Project a size of the array
    { "$project": {
        "values": 1,
        "size": { "$size": "$values" }
    }},
    // Match on that size
    { "$match": { "size": { "$gte": 5 } } },
    // Project just the same fields 
    {{ "$project": {
        "values": 1
    }}
])

所以这些是备用的.有一个本机"方法可用于聚合和数组类型.但是,JavaScript评估对于MongoDB也是本机",这是有争议的,只是因此未在本机代码中实现.

So those are the alternates. There is a "native" method available to aggregation and an array type. But it is fairly arguable that the JavaScript evaluation is also "native" to MongoDB, just not therefore implemented in native code.

这篇关于基于记录中字段数的Mongodb查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆