MongoDB $unwind 和性能 [英] MongoDB $unwind and performance

查看:412
本文介绍了MongoDB $unwind 和性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面是我的数据的一个小例子:

<预><代码>[{"callId": "17dac51e-125e-499e-9064-f20bd3b1a9d8",来电者":{"firstName": "测试","lastName": "测试",电话号码":1231231234"},查询":[{"inquiryId": "b0d14381-ce75-49aa-a66a-c36ae20b72a8",路线历史":[{"assignedUserId": "cfa0ffe9-c77d-4eec-87d7-4430f7772e81","routeDate": "2020-01-01T06:00:00.000Z",状态":路由"},{"assignedUserId": "cfa0ffe9-c77d-4eec-87d7-4430f7772e81","routeDate": "2020-01-03T06:00:00.000Z",状态":路由"}]},{"inquiryId": "9d743be9-7613-46d7-8f9b-a04b4b899b56",路线历史":[{"assignedUserId": "cfa0ffe9-c77d-4eec-87d7-4430f7772e81","routeDate": "2020-01-01T06:00:00.000Z",状态":结束"},{"assignedUserId": "cfa0ffe9-c77d-4eec-87d7-4430f7772e81","routeDate": "2020-01-03T06:00:00.000Z",状态":结束"}]}]}]

我正在针对更多文档运行以下聚合:

db.collection.aggregate([{$unwind: "$查询"},{$匹配:{"inquiries.routeHistory.status": "已结束"}},{$addFields:{inquiries.routeHistory":{$过滤器:{输入:$inquiries.routeHistory",条件: {$eq: [ { $max: "$inquiries.routeHistory.routeDate" }, "$$this.routeDate" ]}}}}},{$组:{_id: "$_id",callId: { $first: "$callId" },来电者:{ $first:$来电者"},查询:{ $push: "$inquiries" }}}])

虽然这确实返回了预期的结果,但它没有扩展.当我针对更大的数据集运行此程序时,出现超时.我在查询的字段上有索引.有没有办法优化我的查询以获得更好的性能?

重要说明:我仅限于使用 文档数据库

解决方案

它不能扩展,因为如果 $match 可以应用索引(和其他人,看看 此处) 发生在管道的开头>

确保你有这个多键索引:{'inquiries.routeHistory.status' : 1}

db.collection.aggregate([{$匹配:{"inquiries.routeHistory.status": "已结束"}},{$unwind: "$查询"},{$匹配:{"inquiries.routeHistory.status": "已结束"}},{$addFields:{inquiries.routeHistory":{$过滤器:{输入:$inquiries.routeHistory",条件: {$eq: [{$max: "$inquiries.routeHistory.routeDate"},$$this.routeDate"]}}}}},{$组:{_id: "$_id",电话号码:{$first: "$callId"},来电者:{$first: "$来电者"},查询:{$push: "$查询"}}}])

注意:很遗憾,DocumentDB 不支持 $map,在这种情况下,我们可以使用 2 $addFields

Below is a small example of my data:

[
  {
    "callId": "17dac51e-125e-499e-9064-f20bd3b1a9d8",
    "caller": {
      "firstName": "Test",
      "lastName": "Testing",
      "phoneNumber": "1231231234"
    },
    "inquiries": [
      {
        "inquiryId": "b0d14381-ce75-49aa-a66a-c36ae20b72a8",
        "routeHistory": [
          {
            "assignedUserId": "cfa0ffe9-c77d-4eec-87d7-4430f7772e81",
            "routeDate": "2020-01-01T06:00:00.000Z",
            "status": "routed"
          },
          {
            "assignedUserId": "cfa0ffe9-c77d-4eec-87d7-4430f7772e81",
            "routeDate": "2020-01-03T06:00:00.000Z",
            "status": "routed"
          }
        ]
      },
      {
        "inquiryId": "9d743be9-7613-46d7-8f9b-a04b4b899b56",
        "routeHistory": [
          {
            "assignedUserId": "cfa0ffe9-c77d-4eec-87d7-4430f7772e81",
            "routeDate": "2020-01-01T06:00:00.000Z",
            "status": "ended"
          },
          {
            "assignedUserId": "cfa0ffe9-c77d-4eec-87d7-4430f7772e81",
            "routeDate": "2020-01-03T06:00:00.000Z",
            "status": "ended"
          }
        ]
      }
    ]
  }
]

I am running the following aggregate against many more documents:

db.collection.aggregate([
    {
        $unwind: "$inquiries"
    },
    {
        $match: {
            "inquiries.routeHistory.status": "ended"
        }
    },
    {
        $addFields: {
            "inquiries.routeHistory": {
                $filter: {
                    input: "$inquiries.routeHistory",
                    cond: {
                        $eq: [ { $max: "$inquiries.routeHistory.routeDate" }, "$$this.routeDate" ]
                    }
                }
            }
        }
    },
    {
        $group: {
            _id: "$_id",
            callId: { $first: "$callId" },
            caller: { $first: "$caller" },
            inquiries: { $push: "$inquiries" }
        }
    }
])

While this does return the expected results, it does not scale. When I run this against a much larger dataset, I'm getting timeouts. I have indexes on the fields I am querying against. Is there a way to optimize my query for better performance?

Important note: I am limited to only using operators supported by DocumentDB

解决方案

It doesn't scale because indexes can be applied if $match (and others, take a look here) occurs at the beginning of a pipeline

Make sure you have this multi-key index: {'inquiries.routeHistory.status' : 1}

db.collection.aggregate([
  {
    $match: {
      "inquiries.routeHistory.status": "ended"
    }
  },
  {
    $unwind: "$inquiries"
  },
  {
    $match: {
      "inquiries.routeHistory.status": "ended"
    }
  },
  {
    $addFields: {
      "inquiries.routeHistory": {
        $filter: {
          input: "$inquiries.routeHistory",
          cond: {
            $eq: [
              {
                $max: "$inquiries.routeHistory.routeDate"
              },
              "$$this.routeDate"
            ]
          }
        }
      }
    }
  },
  {
    $group: {
      _id: "$_id",
      callId: {
        $first: "$callId"
      },
      caller: {
        $first: "$caller"
      },
      inquiries: {
        $push: "$inquiries"
      }
    }
  }
])

Note: It's sad, DocumentDB doesn't support $map, in that case, we could solve it with 2 $addFields

这篇关于MongoDB $unwind 和性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆