具有两级未知父键的Mongo查询嵌套字段值 [英] Mongo Query Nested Field Values with two-level unknown parent keys

查看:46
本文介绍了具有两级未知父键的Mongo查询嵌套字段值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们使用MongoDB来存储时间序列传感器数据,类似于

We use MongoDB to store time-series sensor data similar to the schema designed shown in https://www.mongodb.com/blog/post/schema-design-for-time-series-data-in-mongodb

在一段时间内,我们在数据查询方面确实取得了良好的性能. 关于我们的架构设计的说明: "v" 是传感器读数的父键,使用分钟和秒将时间转换为嵌套数组.我们使用"m" (分钟)作为子父键,然后使用"s" (第二)作为分钟阅读的子键.传感器读数位于"s" 级别,其中 field1 field2 ,..., field10 传感器数据值.

We do get good performance on data queries through time periods. Explanation on the our schema design: "v" is the parent key of sensor readings, the time is converted into nested array using Minutes and Seconds. We use "m"(Minute) as sub-parent key, then "s"(Second) as sub-key of minute reading. The sensor readings are located at the "s" level with field1, field2, ...,field10 as the sensor data values.

现在,我们正在尝试实现一些数据分析功能,并希望通过传感器数据读取值来查询数据.有没有一种有效的方法来查询数据,而无需在查询中使用嵌套的for循环?

Now we are trying to implement some data analysis facilities and looking to query the data through sensor data reading values. Is there an efficient way of querying from data without using nested for loop in the query?

例如:

  1. 具有传感器读数的项目:"field1"> 2
  2. 具有传感器读数的项目:"field1"> 2 "field3"> 5
  1. Items that have sensor reading: "field1">2
  2. Items that have sensor reading: "field1">2 and "field3">5

感谢一百万.

记录看起来像下面的示例.

The records look like the examples below.

{
   "_id": ObjectId("5a5dd49f74bbaefd1ac89fc8"),
   "c_id": "1017",
   "c_id_s": NumberInt(1017),
   "c_t": NumberInt(1516096800),
   "type": "hour",
   "v": {
     "m1": {
       "s54": {
         "field1": 7.373158,
         "entry_id": NumberInt(4635),
         "field3": 0.19,
         "field2": NumberInt(88) 
      } 
    },
     "m31": {
       "s54": {
         "field1": 5.981918,
         "entry_id": NumberInt(4637),
         "field3": 0.04 
      },
       "s55": {
         "field2": NumberInt(89),
         "entry_id": NumberInt(4639),
         "field5": NumberInt(-67) 
      } 
    } 
  },
   "entry_id": NumberInt(4639) 
}, 
{
   "_id": ObjectId("5a5dd1a174bbaefd1ac89fc1"),
   "c_id": "1024",
   "c_id_s": NumberInt(1024),
   "c_t": NumberInt(1516096800),
   "type": "hour",
   "v": {
     "m3": {
       "s22": {
         "field3": 210.479996,
         "entry_id": NumberInt(30297) 
      },
       "s23": {
         "field1": 3.271534,
         "entry_id": NumberInt(30300),
         "field8": 7.1875,
         "field2": NumberInt(94) 
      } 
    },
     "m8": {
       "s23": {
         "field3": 150.639999,
         "entry_id": NumberInt(30304),
         "field1": 2.948425,
         "field8": 7.125,
         "field2": NumberInt(94) 
      } 
    },
     "m13": {
       "s23": {
         "field3": 99.799995,
         "entry_id": NumberInt(30308),
         "field1": 2.849621,
         "field8": 7.0625,
         "field2": NumberInt(95) 
      } 
    },
     "m18": {
       "s23": {
         "field3": 59.099998,
         "entry_id": NumberInt(30312),
         "field1": 2.681393,
         "field8": 6.9375,
         "field2": NumberInt(95) 
      } 
    },
     "m19": {
       "s8": {
         "field5": NumberInt(-87),
         "entry_id": NumberInt(30313) 
      } 
    } 
  },
   "entry_id": NumberInt(30313) 
}

推荐答案

Map reduce允许您处理命名键,但是聚合是进行高效查询的一种方式.

Map reduce allows you to process named keys but aggregation is the way to go for efficient queries.

您必须将数据建模为用于聚合框架的嵌入式文档的数组.

You have to model the data as array of embedded documents for aggregation framework.

我为您提供了两种选择.您可以针对您的数据集进行测试,然后查看哪个更适合您.

I've provided you two options. You can test them out for your dataset and see which one works better for you.

类似

"v":[
  {
    "minute":1,
    "seconds":[
      {
        "second":54,
        "data":{
         "field1":7.373158,
         "entry_id":4635,
         "field3":0.19,
         "field2":88
       }
      }
    ]
  },
  {
    "minute":2,
    "seconds":...
  }
]

现在,您可以轻松查询具有传感器读数的项目:"field1> 2.

Now you can easily query for items that have sensor reading: "field1">2.

db.col.aggregate(
  [{"$match":{"v.seconds.data.field1":{"$gt":2}}},
   {"$unwind":"$v"}, 
   {"$match":{"v.seconds.data.field1":{"$gt":2}}},
   {"$unwind":"$v.seconds"}, 
   {"$match":{"v.seconds.data.field1":{"$gt":2}}},
   {"$project":{"data":"$v.seconds.data"}}]
)

或者,您可以按分钟分割文档.像

Alternatively, You can split the documents by minute. Something like

"v":[
  {
    "second":1,
    "data":{
       "field1":7.373158,
       "entry_id":4635,
       "field3":0.19,
       "field2":88
     }
  },
  {
     "second":2,
     "data":...
  }
]

您现在可以像(在v.data.field1上具有索引)一样查询

You can now query like ( with index on v.data.field1 )

db.col.aggregate(
  [{"$match":{"v.data.field1":{"$gt":2}}},
   {"$unwind":"$v"}, 
   {"$match":{"v.data.field1":{"$gt":2}}},
   {"$project":{"data":"$v.data"}}]
)

您可以查询具有传感器读数的项目:"field1> 2和"field3"> 5

You can query items that have sensor reading: "field1">2 and "field3">5

使用第一个结构

db.col.aggregate(
  [{"$match":{"v":{"$elemMatch":{"seconds": {$elemMatch:{"field1":{$gt":2},"field3":{$gt":5}}}}}}},
  {"$unwind":"$v"}, 
    {"$match":{"v.seconds": {$elemMatch:{"field1":{$gt":2},"field3":{$gt":5}}}}},
  {"$unwind":"$v.seconds"}, 
  {"$project":{"data":"$v.seconds.data"}}]
)

使用第二个结构

db.col.aggregate(
  [{"$match":{"v.data":{$elemMatch:{"field1":{$gt":2},"field3":{$gt":5}}}}},
  {"$unwind":"$v"}, 
  {"$match":{"v.data.field1":{"$gt":2},"v.data.field3":{"$gt":5} }},
  {"$project":{"data":"$v.data"}}]
)

Mongo Update 3.6

Mongo Update 3.6

$match$expr一起接受聚合表达式.

$match with $expr which accepts aggregation expression.

$gt > 0-聚合表达式,用于检查在一分钟内所有匹配的秒数标准之和大于0

$gt > 0 - aggregation expression to check where the sum of all matching seconds criteria in a minute is greater than 0

$objectToArray将已命名的键转换为键值对,后跟输入标准上的$filter秒,并且不输出匹配的秒记录.

$objectToArray to convert the named keys into key value pair followed by $filter seconds on input criteria and output no of matching seconds record.

db.testcol.aggregate(
{"$match":{
  "$expr":{
    "$gt":[
      {"$sum":{
        "$map":{
          "input":{"$objectToArray":"$v"},
          "as":"secondsofminute",
          "in":{
            "$size":{
              "$filter":{
                "input":{"$objectToArray":"$$secondsofminute.v"},
                "as":"seconds",
                "cond":{"$gt":["$$seconds.v.field2",2]}
              }
            }
          }
        }
      }},
    0]
  }
}})

Mongo Update 3.4-将$expr替换为$redact

Mongo Update 3.4 - Replace $expr with $redact

db.col.aggregate(
 {"$redact":{
  "$cond":{
    "if":{
      "$gt":[
        {"$sum":{
          "$map":{
            "input":{"$objectToArray":"$v"},
            "as":"secondsofminute",
            "in":{
              "$size":{
                "$filter":{
                  "input":{"$objectToArray":"$$secondsofminute.v"},
                  "as":"seconds",
                  "cond":{"$gt":["$$seconds.v.field2",2]}
                }
              }
            }
          }
        }},
        0]
    },
   "then":"$$KEEP",
   "else":"$$PRUNE"
  }
}})

这篇关于具有两级未知父键的Mongo查询嵌套字段值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆