使用MongoDB聚合框架计算一阶导数 [英] Compute first order derivative with MongoDB aggregation framework

查看:110
本文介绍了使用MongoDB聚合框架计算一阶导数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以使用聚合框架计算一阶导数?

Is it possible to calculate a first order derivative using the aggregate framework?

例如,我有数据:

{time_series : [10,20,40,70,110]}

我正在尝试获得类似以下的输出:

I'm trying to obtain an output like:

{derivative : [10,20,30,40]}

推荐答案

db.collection.aggregate(
    [
      {
        "$addFields": {
          "indexes": {
            "$range": [
              0,
              {
                "$size": "$time_series"
              }
            ]
          },
          "reversedSeries": {
            "$reverseArray": "$time_series"
          }
        }
      },
      {
        "$project": {
          "derivatives": {
            "$reverseArray": {
              "$slice": [
                {
                  "$map": {
                    "input": {
                      "$zip": {
                        "inputs": [
                          "$reversedSeries",
                          "$indexes"
                        ]
                      }
                    },
                    "in": {
                      "$subtract": [
                        {
                          "$arrayElemAt": [
                            "$$this",
                            0
                          ]
                        },
                        {
                          "$arrayElemAt": [
                            "$reversedSeries",
                            {
                              "$add": [
                                {
                                  "$arrayElemAt": [
                                    "$$this",
                                    1
                                  ]
                                },
                                1
                              ]
                            }
                          ]
                        }
                      ]
                    }
                  }
                },
                {
                  "$subtract": [
                    {
                      "$size": "$time_series"
                    },
                    1
                  ]
                }
              ]
            }
          },
          "time_series": 1
        }
      }
    ]
)

我们可以使用3.4+以上版本中的管道来执行此操作. 在管道中,我们使用 $addFields 管道阶段.运算符以添加"time_series"元素索引的数组来做文档,我们还反转了时间序列数组,并分别使用 $reverseArray 运算符

We can use the pipeline above in version 3.4+ to do this. In the pipeline, we use the $addFields pipeline stage. operator to add the array of the "time_series"'s elements index to do document, we also reversed the time series array and add it to the document using respectively the $range and $reverseArray operators

我们在这里反转数组是因为数组中p位置的元素总是大于p+1位置的元素,这意味着[p] - [p+1] < 0,我们不想使用

We reversed the array here because the element at position p in the array is always greater than the element at position p+1 which means that [p] - [p+1] < 0 and we do not want to use the $multiply here.(see pipeline for version 3.2)

接下来,我们$zipped使用带有索引数组的时间序列数据并应用$map运算符将> substract 表达式转换为结果数组.

Next we $zipped the time series data with the indexes array and applied a substract expression to the resulted array using the $map operator.

然后我们将结果$slice从数组中丢弃null/None值并重新反转结果.

We then $slice the result to discard the null/None value from the array and re-reversed the result.

在3.2中,我们可以使用 $unwind 运算符通过将文档指定为操作数而不是以 $ 为前缀的传统路径",来展开我们的数组并包括数组中每个元素的索引.

In 3.2 we can use the $unwind operator to unwind our array and include the index of each element in the array by specifying a document as operand instead of the traditional "path" prefixed by $.

下一步,我们需要 $group 我们的文档,并使用 $push 累加器运算符返回看起来像这样的子文档数组:

Next in the pipeline, we need to $group our documents and use the $push accumulator operator to return an array of sub-documents that look like this:

{
    "_id" : ObjectId("57c11ddbe860bd0b5df6bc64"),
    "time_series" : [
        { "value" : 10, "index" : NumberLong(0) },
        { "value" : 20, "index" : NumberLong(1) },
        { "value" : 40, "index" : NumberLong(2) },
        { "value" : 70, "index" : NumberLong(3) },
        { "value" : 110, "index" : NumberLong(4) }
    ]
}


最后进入 $project 阶段.在此阶段,我们需要使用 $map 运算符在$group阶段对新计算的数组中的每个元素应用一系列表达式.


Finally comes the $project stage. In this stage, we need to use the $map operator to apply a series of expression to each element in the the newly computed array in the $group stage.

这是$map内部的内容(请参阅$map作为for循环)表达式中:

Here is what is going on inside the $map (see $map as a for loop) in expression:

对于每个子文档,我们使用value 字段分配给变量="nofollow noreferrer"> $let 变量运算符.然后,从数组中下一个元素的值"字段的值中减去它的值.

For each subdocument, we assign the value field to a variable using the $let variable operator. We then subtract it value from the value of the "value" field of the next element in the array.

由于数组中的下一个元素是当前索引处的元素加一个,因此我们所需要的只是

Since the next element in the array is the element at the current index plus one, all we need is the help of the $arrayElemAt operator and a simple $addition of the current element's index and 1.

$subtract 表达式返回负值,因此我们需要使用 $multiply 将值乘以-1运算符.

我们还需要 $filter 结果数组,因为如果最后一个元素是Nonenull.原因是当当前元素是最后一个元素时,$subtract返回None,因为下一个元素的索引等于数组的大小.

We also need to $filter the resulted array because it the last element is None or null. The reason is that when the current element is the last element, $subtract return None because the index of the next element equal the size of the array.

db.collection.aggregate([
  {
    "$unwind": {
      "path": "$time_series",
      "includeArrayIndex": "index"
    }
  },
  {
    "$group": {
      "_id": "$_id",
      "time_series": {
        "$push": {
          "value": "$time_series",
          "index": "$index"
        }
      }
    }
  },
  {
    "$project": {
      "time_series": {
        "$filter": {
          "input": {
            "$map": {
              "input": "$time_series",
              "as": "el",
              "in": {
                "$multiply": [
                  {
                    "$subtract": [
                      "$$el.value",
                      {
                        "$let": {
                          "vars": {
                            "nextElement": {
                              "$arrayElemAt": [
                                "$time_series",
                                {
                                  "$add": [
                                    "$$el.index",
                                    1
                                  ]
                                }
                              ]
                            }
                          },
                          "in": "$$nextElement.value"
                        }
                      }
                    ]
                  },
                  -1
                ]
              }
            }
          },
          "as": "item",
          "cond": {
            "$gte": [
              "$$item",
              0
            ]
          }
        }
      }
    }
  }
])


另一个我认为效率较低的选择是使用


Another option which I think is less efficient is perform a map/reduce operation on our collection using the map_reduce method.

>>> import pymongo
>>> from bson.code import Code
>>> client = pymongo.MongoClient()
>>> db = client.test
>>> collection = db.collection
>>> mapper = Code("""
...               function() {
...                 var derivatives = [];
...                 for (var index=1; index<this.time_series.length; index++) {
...                   derivatives.push(this.time_series[index] - this.time_series[index-1]);
...                 }
...                 emit(this._id, derivatives);
...               }
...               """)
>>> reducer = Code("""
...                function(key, value) {}
...                """)
>>> for res in collection.map_reduce(mapper, reducer, out={'inline': 1})['results']:
...     print(res)  # or do something with the document.
... 
{'value': [10.0, 20.0, 30.0, 40.0], '_id': ObjectId('57c11ddbe860bd0b5df6bc64')}


您还可以检索所有文档,并使用numpy.diff这样返回派生类:


You can also retrieve all the document and use the numpy.diff to return the derivative like this:

import numpy as np


for document in collection.find({}, {'time_series': 1}):
    result = np.diff(document['time_series']) 

这篇关于使用MongoDB聚合框架计算一阶导数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆