第一个匹配步骤后 MongoDB 聚合管道变慢 [英] MongoDB aggregate pipeline slow after first match step

查看:31
本文介绍了第一个匹配步骤后 MongoDB 聚合管道变慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 MongoDB 聚合管道,其中包含许多步骤(匹配索引字段、添加字段、排序、折叠、再次排序、页面、项目结果.)如果我注释掉除第一个匹配步骤之外的所有步骤,查询执行速度超快(0.075 秒),因为它利用了正确的索引.但是,如果我随后尝试执行任何后续步骤,即使是获取结果计数这样简单的操作,查询也会开始花费 27 秒!!!

I have a MongoDB aggregate pipeline that contains a number of steps (match on indexed fields, add fields, sort, collapse, sort again, page, project results.) If I comment out all of the steps except the first match step, the query executes super fast (.075 seconds), as it's leveraging the proper index. However, if I then try to perform ANY follow up step, even something as simple as getting the results count, the query then starts taking 27 seconds!!!

这是查询:(不要太在意它的复杂性,因为索引正在快速执行它的工作......)

Here is the query: (Don't get too caught up in the complexity of it, as the indexes are doing their job in executing it quickly...)

db.runCommand({ 
  aggregate: 'ResidentialProperty', 
  allowDiskUse: false, 
  explain: false,
  cursor: {}, 
  pipeline: 
    [
      {
                "$match" : {
                    "$and" : [ 
                        {
                            "CountyPlaceId" : 20006073
                        }, 
                        {
                            "$or" : [ 
                                {
                                    "$and" : [ 
                                        {
                                            "ForSaleGroupId" : {
                                                "$in" : [ 
                                                    2, 
                                                    3
                                                ]
                                            }
                                        }, 
                                        {
                                            "$or" : [ 
                                                {
                                                    "ForSaleGroupId" : {
                                                        "$nin" : [ 
                                                            2, 
                                                            3
                                                        ]
                                                    }
                                                }, 
                                                {
                                                    "ListDate" : {
                                                        "$gte" : ISODate("2019-02-21T00:00:00.000Z")
                                                    }
                                                }
                                            ]
                                        }, 
                                        {
                                            "$or" : [ 
                                                {
                                                    "ForSaleGroupId" : {
                                                        "$ne" : 3
                                                    }
                                                }, 
                                                {
                                                    "PendingSaleDate" : {
                                                        "$gte" : ISODate("2019-02-21T00:00:00.000Z")
                                                    }
                                                }
                                            ]
                                        }
                                    ]
                                }, 
                                {
                                    "ForLeaseGroupId" : {
                                        "$in" : [ 
                                            2, 
                                            3
                                        ]
                                    },
                                    "$or" : [ 
                                        {
                                            "ForLeaseGroupId" : {
                                                "$nin" : [ 
                                                    2, 
                                                    3
                                                ]
                                            }
                                        }, 
                                        {
                                            "ListDate" : {
                                                "$gte" : ISODate("2019-02-21T00:00:00.000Z")
                                            }
                                        }
                                    ]
                                }, 
                                {
                                    "DistressedGroupId" : {
                                        "$in" : [ 
                                            2, 
                                            3, 
                                            4
                                        ]
                                    },
                                    "$or" : [ 
                                        {
                                            "DistressedGroupId" : 1
                                        }, 
                                        {
                                            "DistressedDate" : {
                                                "$gte" : ISODate("2019-02-21T00:00:00.000Z")
                                            }
                                        }
                                    ]
                                }, 
                                {
                                    "$and" : [ 
                                        {
                                            "OffMarketGroupId" : {
                                                "$in" : [ 
                                                    3, 
                                                    8
                                                ]
                                            }
                                        }, 
                                        {
                                            "$or" : [ 
                                                {
                                                    "OffMarketGroupId" : 1
                                                }, 
                                                {
                                                    "OffMarketDate" : {
                                                        "$gte" : ISODate("2019-02-21T00:00:00.000Z")
                                                    }
                                                }
                                            ]
                                        }, 
                                        {
                                            "$or" : [ 
                                                {
                                                    "OffMarketGroupId" : {
                                                        "$nin" : [ 
                                                            7, 
                                                            8
                                                        ]
                                                    }
                                                }, 
                                                {
                                                    "SoldDate" : {
                                                        "$gte" : ISODate("2019-02-21T00:00:00.000Z")
                                                    }
                                                }, 
                                                {
                                                    "OffMarketDate" : {
                                                        "$gte" : ISODate("2019-02-21T00:00:00.000Z")
                                                    }
                                                }
                                            ]
                                        }
                                    ]
                                }, 
                                {
                                    "$or" : [ 
                                        {
                                            "ForSaleGroupId" : {
                                                "$ne" : 1
                                            }
                                        }, 
                                        {
                                            "OffMarketGroupId" : 6
                                        }
                                    ],
                                    "ChangedListPriceDate" : {
                                        "$gte" : ISODate("2019-02-21T00:00:00.000Z")
                                    }
                                }
                            ]
                        }, 
                        {
                            "$or" : [ 
                                {
                                    "ForSaleGroupId" : {
                                        "$ne" : 1
                                    }
                                }, 
                                {
                                    "ForLeaseGroupId" : {
                                        "$ne" : 1
                                    }
                                }, 
                                {
                                    "OffMarketGroupId" : 6
                                }, 
                                {
                                    "IsListingOnly" : true
                                }, 
                                {
                                    "OrgId" : ""
                                }, 
                                {
                                    "OffMarketDate" : {
                                        "$gte" : ISODate("2018-11-23T00:00:00.000Z")
                                    }
                                }
                            ]
                        }, 
                        {
                            "PropertyTypeId" : {
                                "$in" : [ 
                                    1, 
                                    5, 
                                    6
                                ]
                            }
                        }
                    ]
                }
            }, 
      // Other steps ommitted, since it's slow regardless...
      { "$count": "Count" }
   ] 
})

ResidentialProperty 文档示例如下所示:

Here is what a sample ResidentialProperty document looks like:

{
                "_id" : 294401911,
                "PropertyId" : 86689647,
                "OrgId" : "caclaw-n",
                "OrgSecurableId" : 1,
                "ListingId" : "19443870",
                "Location" : {
                    "type" : "Point",
                    "coordinates" : [ 
                        -117.316207, 
                        33.104623
                    ]
                },
                "CountyPlaceId" : 20006073,
                "CityPlaceId" : 50611194,
                "ZipCodePlaceId" : 70092011,
                "MetropolitanAreaPlaceId" : 10041740,
                "MinorCivilDivisionPlaceId" : 30002074,
                "NeighborhoodPlaceId" : 150813707,
                "MacroNeighborhoodPlaceId" : 160051666,
                "SubNeighborhoodPlaceId" : null,
                "ResidentialNeighborhoodsPlaceId" : 220978234,
                "ForSaleGroupId" : 1,
                "DistressedGroupId" : 1,
                "OffMarketGroupId" : 1,
                "ForLeaseGroupId" : 2,
                "ForSaleDistressedGroupId" : 1,
                "OffMarketDistressedGroupId" : 1,
                "ListDate" : ISODate("2019-03-15T00:00:00.000Z"),
                "PendingSaleDate" : null,
                "OffMarketDate" : null,
                "DistressedDate" : null,
                "SoldDate" : null,
                "ChangedListPriceDate" : null,
                "ListPrice" : null,
                "ListPriceRangeLow" : null,
                "ListPriceRangeHigh" : null,
                "ListPricePerSqFt" : null,
                "ListPricePerLotSizeSqFt" : null,
                "SoldPrice" : 0,
                "SoldPricePerSqFt" : 0.0,
                "SoldPricePerLotSizeSqFt" : 0.0,
                "MonthlyLeaseListPrice" : 6950.0,
                "MonthlyLeaseListPricePerSqFt" : 2.5402,
                "MonthlyLeaseListPricePerLotSizeSqFt" : 2.5402,
                "MonthlyLeaseSoldPrice" : null,
                "MonthlyLeaseSoldPricePerSqFt" : null,
                "MonthlyLeaseSoldPricePerLotSizeSqFt" : null,
                "SoldToListPriceRatio" : 0.0,
                "EstimatedToListPriceRatio" : 0.0,
                "AppPropertyModeId" : 1,
                "PropertyTypeId" : 1,
                "PropertySubTypeId" : null,
                "Bedrooms" : 4,
                "Bathrooms" : 3,
                "LivingAreaInSqFt" : 2736,
                "LotSizeInSqFt" : NumberLong(5073),
                "YearBuilt" : 2004,
                "GarageSpaces" : 2,
                "BuildingSizeInSqFt" : 2736,
                "Units" : 1,
                "Rooms" : null,
                "NetIncome" : null,
                "EstimateTypeId" : 3,
                "EstimatedValue" : 1253740,
                "EstimatedValuePerSqFt" : 458.2383,
                "EstimatedValuePerLotSizeSqFt" : 247.1397,
                "CapRate" : null,
                "Keywords" : [ 
                    "$6,950/month long-term minimum of 30 days. $8,950 June and then $9,950 for July or August. BeautifulWaters End Luxury Home walking distance to the beach. Short or Long term Fully Furnished (1 Month plus) with brand new furnishings & fresh paint & new carpets. Enjoy the beach & golf community lifestyle of Carlsbad, CA in this delightful North County San Diego vacation rental home!  This spacious & comfortable two story single family home sits on a cul-de-sac in the gated community of Waters End. Easy walk to the beach and close proximity to the Carlsbad train station, area restaurants, shopping, golf courses, and San Diego theme park attractions. The community also offers many health and beauty spas, yoga, and meditation centers, nearby world-renowned golf courses (such as Torrey Pines, Aviara, and La Costa Resort and Spa) as well as some of the best cycling in all of San Diego County.", 
                    "San Diego (City) (Sd)", 
                    "R1", 
                    "Single Family"
                ],
                "OwnerName" : "Brookside Land Trust, ; State Trustee Services Llc",
                "TenantNames" : null,
                "Apn" : "214-610-49-00",
                "OpenHouseStartDate" : null,
                "OpenHouseEndDate" : null,
                "ListingPhotoCount" : 25,
                "StatusChangedDate" : ISODate("2019-06-28T00:00:00.000Z"),
                "SortAddress" : "BrooksideCtZZZZZZZZZZ00000000000000000617ZZZZZCarlsbadCA92011",
                "SortOwnerName" : "BrooksideLandTrust,;State",
                "ListingIdAlphaNum" : "19443870",
                "IsListingOnly" : false
            }

计数返回 27,815 个结果.我不认为这是一个索引问题,因为第一个匹配步骤执行得如此之快.我也不认为这是每个聚合管道步骤达到 100mb 内存限制的问题,因为我设置了 allowDiskUse: false 并且它仍在执行查询而不会出错.

The count returns 27,815 results. I don't see this as being an indexing issue, as the first matching step executes so fast. I also don't see this as being an issue with hitting the 100mb in memory limit per aggregation pipeline step, as I'm setting allowDiskUse: false and yet it's still executing the query without erroring.

同样有趣的是,针对同一集合的另一个聚合管道查询在第一个匹配步骤后过滤到 45,081 条记录,但是当我在此之后执行计数时,它仅在 3 秒内返回.所以这个问题不能真正归咎于文档结构.

Also of interest, another aggregation pipeline query against the same collection filters down to 45,081 records after the first match step, and yet when I execute a count after that it returns in only 3 seconds. So the document structure can't really be blamed for this issue.

这到底是怎么回事?为什么匹配过滤如此之快,但之后的任何操作,即使是像计数这样简单的操作,速度都如此之慢?我试过启用解释:真实,但我没有看到任何突出的东西.匹配操作表明它使用了正确的索引.计数操作未在说明中包含任何其他详细信息.

So what the heck is going on here? Why is the match filtering so fast and yet any operation after, even something as simple as a count, is so incredibly slow? I've tried enabling explain: true and I don't see anything that stands out there. The match operation shows that it's using the proper index. The count operation doesn't include any additional details in the explain.

推荐答案

2019 ANSWER

此答案适用于 MongoDB 4.2

阅读问题和你们之间的讨论后,我相信问题已经解决,但优化仍然是所有使用 MongoDB 的人的常见问题.

After reading the question and the discussion between you guys, I believe that the issue is resolved but still optimization is a common problem for all who are using MongoDB.

我遇到了同样的问题,这里是查询优化的提示.

I faced the same problem, and here are the tips for query optimization.

如果我错了,请纠正我:)

Correct me if I'm wrong :)

1.在集合上添加索引

索引在快速运行查询方面起着至关重要的作用,因为索引是一种数据结构,可以以易于遍历的形式存储集合的数据集.借助 MongoDB 中的索引,可以高效地执行查询.

Indexes play a vital role in running queries quickly as Indexes are data structures that can store the collection’s data set in a form that is easy to traverse. Queries are efficiently executed with the help of indexes in MongoDB.

您可以根据需要创建不同类型的索引.在此处了解有关索引的更多信息,官方 MongoDB 文档.

You can create a different type of indexes according to your need. Learn more about indexes here, the official MongoDB documentation.

2.流水线优化

  • 始终在 $project 之前使用 $match,因为过滤器会从下一阶段移除额外的文档和字段.
  • 永远记住,索引由 $match 和 $sort 使用.因此,请尝试为要排序或过滤文档的字段添加索引.
  • 尽量在您的查询中保留这个序列,在 $limit 之前使用 $sort,例如 $sort + $limit + $skip.因为$sort利用了索引,允许MongoDB在执行查询时选择需要的查询计划.
  • 始终在 $skip 之前使用 $limit,以便将skip 应用于限制文档.
  • 使用 $project 仅返回下一阶段所需的数据.
  • 始终在 $lookup 中为 foreignField 属性创建索引.此外,由于查找会生成一个数组,我们通常会在下一阶段展开它.因此,与其在下一阶段展开它,不如在查找中展开它:

  • Always use $match before $project, as filters remove extra documents and fields from the next stage.
  • Always remember, indexes are used by $match and $sort. So, try to add an index to the fields on which you going to sort or filter documents.
  • Try to keep this sequence in your query, use $sort before $limit like $sort + $limit + $skip. Because $sort takes advantage of the index and allows MongoDB to select the required query plan while executing the query.
  • Always use $limit before $skip so that skip will be applied to limit Documents.
  • Use $project to return only the necessary data in the next stage.
  • Always create an index on the foreignField attributes in a $lookup. Also, as lookup produces an array, we generally unwind it in next stage. So, instead of unwinding it in next stage unwind it inside the lookup like:

{
$lookup: {
    from: "Collection",
    as: "resultingArrays",
    localField: "x",
    foreignField: "y",
    unwinding: { preserveNullAndEmptyArrays: false }

}}

在聚合中使用allowDiskUse,借助它,聚合操作可以将数据写入Database Path目录下的_tmp子目录.它用于对临时目录执行大型查询.例如:

Use allowDiskUse in aggregation, with the help of it aggregation operations can write data to the _tmp subdirectory in the Database Path directory. It is used to perform the large query on temp directory. For example:

 db.orders.aggregate(
 [
        { $match: { status: "A" } },
        { $group: { _id: "$uid", total: { $sum: 1 } } },
        { $sort: { total: -1 } }
 ],
 {
        allowDiskUse: true
 },
 )

3.重建索引

如果你经常创建和删除索引,那么重建你的索引.它帮助 MongoDB 刷新先前存储的查询计划,缓存,它不断接管所需的查询计划,相信我,这个问题很糟糕:(

If you are creating and deleting indexes quite often then rebuild your indexes. It helps MongoDB to refresh, the previously-stored query plan in, the cache, which keeps on taking over the required query plan, believe me, that issue sucks :(

4.删除不需要的索引

太多的索引在创建、更新和删除操作中花费太多时间,因为它们需要创建索引以及它们的任务.因此,删除它们会有很大帮助.

Too many indexes take too much time in Create, Update and Delete operation as they need to create index along with their tasks. So, remove them helps a lot.

5.限制文档

在实际场景中,获取数据库中存在的完整数据无济于事.此外,要么您无法显示它,要么用户无法读取完整的获取数据.因此,不是获取完整的数据,而是以块的形式获取数据,这有助于您和您的客户观看该数据.

In a real-world scenario, fetching complete data present in the database does not help. Also, either you can't display it or the user can't read complete fetched data. So, instead of fetching complete data, fetch data in chunks which helps both you and your client watching that data.

最后观察 MongoDB 选择的执行计划有助于找出主要问题.因此,$explain 将帮助您解决这个问题.强>

And lastly watching what execution plan is selected by MongoDB helps in figuring out the main issue. So, $explain will help you in figuring that out.

希望这篇总结能帮到你们,如果我有遗漏,请随时提出新的观点.我也会添加它们.

Hope this summary will help you guys, feel free to suggest new points if I missed any. I will add them too.

这篇关于第一个匹配步骤后 MongoDB 聚合管道变慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆