在第一个匹配步骤后,MongoDB聚合管道变慢 [英] MongoDB aggregate pipeline slow after first match step

查看:83
本文介绍了在第一个匹配步骤后,MongoDB聚合管道变慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个MongoDB聚合管道,其中包含许多步骤(匹配索引字段,添加字段,排序,折叠,再次排序,页面,项目结果.)如果我注释掉除第一个匹配步骤以外的所有步骤,该查询会利用适当的索引来执行超快速(.075秒)的查询.但是,如果我随后尝试执行任何后续步骤,即使只是简单地执行获取结果计数之类的操作,查询就将花费27秒!

I have a MongoDB aggregate pipeline that contains a number of steps (match on indexed fields, add fields, sort, collapse, sort again, page, project results.) If I comment out all of the steps except the first match step, the query executes super fast (.075 seconds), as it's leveraging the proper index. However, if I then try to perform ANY follow up step, even something as simple as getting the results count, the query then starts taking 27 seconds!!!

这里是查询:(不要太着迷于它的复杂性,因为索引正在迅速执行它……)

Here is the query: (Don't get too caught up in the complexity of it, as the indexes are doing their job in executing it quickly...)

db.runCommand({ 
  aggregate: 'ResidentialProperty', 
  allowDiskUse: false, 
  explain: false,
  cursor: {}, 
  pipeline: 
    [
      {
                "$match" : {
                    "$and" : [ 
                        {
                            "CountyPlaceId" : 20006073
                        }, 
                        {
                            "$or" : [ 
                                {
                                    "$and" : [ 
                                        {
                                            "ForSaleGroupId" : {
                                                "$in" : [ 
                                                    2, 
                                                    3
                                                ]
                                            }
                                        }, 
                                        {
                                            "$or" : [ 
                                                {
                                                    "ForSaleGroupId" : {
                                                        "$nin" : [ 
                                                            2, 
                                                            3
                                                        ]
                                                    }
                                                }, 
                                                {
                                                    "ListDate" : {
                                                        "$gte" : ISODate("2019-02-21T00:00:00.000Z")
                                                    }
                                                }
                                            ]
                                        }, 
                                        {
                                            "$or" : [ 
                                                {
                                                    "ForSaleGroupId" : {
                                                        "$ne" : 3
                                                    }
                                                }, 
                                                {
                                                    "PendingSaleDate" : {
                                                        "$gte" : ISODate("2019-02-21T00:00:00.000Z")
                                                    }
                                                }
                                            ]
                                        }
                                    ]
                                }, 
                                {
                                    "ForLeaseGroupId" : {
                                        "$in" : [ 
                                            2, 
                                            3
                                        ]
                                    },
                                    "$or" : [ 
                                        {
                                            "ForLeaseGroupId" : {
                                                "$nin" : [ 
                                                    2, 
                                                    3
                                                ]
                                            }
                                        }, 
                                        {
                                            "ListDate" : {
                                                "$gte" : ISODate("2019-02-21T00:00:00.000Z")
                                            }
                                        }
                                    ]
                                }, 
                                {
                                    "DistressedGroupId" : {
                                        "$in" : [ 
                                            2, 
                                            3, 
                                            4
                                        ]
                                    },
                                    "$or" : [ 
                                        {
                                            "DistressedGroupId" : 1
                                        }, 
                                        {
                                            "DistressedDate" : {
                                                "$gte" : ISODate("2019-02-21T00:00:00.000Z")
                                            }
                                        }
                                    ]
                                }, 
                                {
                                    "$and" : [ 
                                        {
                                            "OffMarketGroupId" : {
                                                "$in" : [ 
                                                    3, 
                                                    8
                                                ]
                                            }
                                        }, 
                                        {
                                            "$or" : [ 
                                                {
                                                    "OffMarketGroupId" : 1
                                                }, 
                                                {
                                                    "OffMarketDate" : {
                                                        "$gte" : ISODate("2019-02-21T00:00:00.000Z")
                                                    }
                                                }
                                            ]
                                        }, 
                                        {
                                            "$or" : [ 
                                                {
                                                    "OffMarketGroupId" : {
                                                        "$nin" : [ 
                                                            7, 
                                                            8
                                                        ]
                                                    }
                                                }, 
                                                {
                                                    "SoldDate" : {
                                                        "$gte" : ISODate("2019-02-21T00:00:00.000Z")
                                                    }
                                                }, 
                                                {
                                                    "OffMarketDate" : {
                                                        "$gte" : ISODate("2019-02-21T00:00:00.000Z")
                                                    }
                                                }
                                            ]
                                        }
                                    ]
                                }, 
                                {
                                    "$or" : [ 
                                        {
                                            "ForSaleGroupId" : {
                                                "$ne" : 1
                                            }
                                        }, 
                                        {
                                            "OffMarketGroupId" : 6
                                        }
                                    ],
                                    "ChangedListPriceDate" : {
                                        "$gte" : ISODate("2019-02-21T00:00:00.000Z")
                                    }
                                }
                            ]
                        }, 
                        {
                            "$or" : [ 
                                {
                                    "ForSaleGroupId" : {
                                        "$ne" : 1
                                    }
                                }, 
                                {
                                    "ForLeaseGroupId" : {
                                        "$ne" : 1
                                    }
                                }, 
                                {
                                    "OffMarketGroupId" : 6
                                }, 
                                {
                                    "IsListingOnly" : true
                                }, 
                                {
                                    "OrgId" : ""
                                }, 
                                {
                                    "OffMarketDate" : {
                                        "$gte" : ISODate("2018-11-23T00:00:00.000Z")
                                    }
                                }
                            ]
                        }, 
                        {
                            "PropertyTypeId" : {
                                "$in" : [ 
                                    1, 
                                    5, 
                                    6
                                ]
                            }
                        }
                    ]
                }
            }, 
      // Other steps ommitted, since it's slow regardless...
      { "$count": "Count" }
   ] 
})

这是一个样本ResidentialProperty文档的样子:

Here is what a sample ResidentialProperty document looks like:

{
                "_id" : 294401911,
                "PropertyId" : 86689647,
                "OrgId" : "caclaw-n",
                "OrgSecurableId" : 1,
                "ListingId" : "19443870",
                "Location" : {
                    "type" : "Point",
                    "coordinates" : [ 
                        -117.316207, 
                        33.104623
                    ]
                },
                "CountyPlaceId" : 20006073,
                "CityPlaceId" : 50611194,
                "ZipCodePlaceId" : 70092011,
                "MetropolitanAreaPlaceId" : 10041740,
                "MinorCivilDivisionPlaceId" : 30002074,
                "NeighborhoodPlaceId" : 150813707,
                "MacroNeighborhoodPlaceId" : 160051666,
                "SubNeighborhoodPlaceId" : null,
                "ResidentialNeighborhoodsPlaceId" : 220978234,
                "ForSaleGroupId" : 1,
                "DistressedGroupId" : 1,
                "OffMarketGroupId" : 1,
                "ForLeaseGroupId" : 2,
                "ForSaleDistressedGroupId" : 1,
                "OffMarketDistressedGroupId" : 1,
                "ListDate" : ISODate("2019-03-15T00:00:00.000Z"),
                "PendingSaleDate" : null,
                "OffMarketDate" : null,
                "DistressedDate" : null,
                "SoldDate" : null,
                "ChangedListPriceDate" : null,
                "ListPrice" : null,
                "ListPriceRangeLow" : null,
                "ListPriceRangeHigh" : null,
                "ListPricePerSqFt" : null,
                "ListPricePerLotSizeSqFt" : null,
                "SoldPrice" : 0,
                "SoldPricePerSqFt" : 0.0,
                "SoldPricePerLotSizeSqFt" : 0.0,
                "MonthlyLeaseListPrice" : 6950.0,
                "MonthlyLeaseListPricePerSqFt" : 2.5402,
                "MonthlyLeaseListPricePerLotSizeSqFt" : 2.5402,
                "MonthlyLeaseSoldPrice" : null,
                "MonthlyLeaseSoldPricePerSqFt" : null,
                "MonthlyLeaseSoldPricePerLotSizeSqFt" : null,
                "SoldToListPriceRatio" : 0.0,
                "EstimatedToListPriceRatio" : 0.0,
                "AppPropertyModeId" : 1,
                "PropertyTypeId" : 1,
                "PropertySubTypeId" : null,
                "Bedrooms" : 4,
                "Bathrooms" : 3,
                "LivingAreaInSqFt" : 2736,
                "LotSizeInSqFt" : NumberLong(5073),
                "YearBuilt" : 2004,
                "GarageSpaces" : 2,
                "BuildingSizeInSqFt" : 2736,
                "Units" : 1,
                "Rooms" : null,
                "NetIncome" : null,
                "EstimateTypeId" : 3,
                "EstimatedValue" : 1253740,
                "EstimatedValuePerSqFt" : 458.2383,
                "EstimatedValuePerLotSizeSqFt" : 247.1397,
                "CapRate" : null,
                "Keywords" : [ 
                    "$6,950/month long-term minimum of 30 days. $8,950 June and then $9,950 for July or August. BeautifulWaters End Luxury Home walking distance to the beach. Short or Long term Fully Furnished (1 Month plus) with brand new furnishings & fresh paint & new carpets. Enjoy the beach & golf community lifestyle of Carlsbad, CA in this delightful North County San Diego vacation rental home!  This spacious & comfortable two story single family home sits on a cul-de-sac in the gated community of Waters End. Easy walk to the beach and close proximity to the Carlsbad train station, area restaurants, shopping, golf courses, and San Diego theme park attractions. The community also offers many health and beauty spas, yoga, and meditation centers, nearby world-renowned golf courses (such as Torrey Pines, Aviara, and La Costa Resort and Spa) as well as some of the best cycling in all of San Diego County.", 
                    "San Diego (City) (Sd)", 
                    "R1", 
                    "Single Family"
                ],
                "OwnerName" : "Brookside Land Trust, ; State Trustee Services Llc",
                "TenantNames" : null,
                "Apn" : "214-610-49-00",
                "OpenHouseStartDate" : null,
                "OpenHouseEndDate" : null,
                "ListingPhotoCount" : 25,
                "StatusChangedDate" : ISODate("2019-06-28T00:00:00.000Z"),
                "SortAddress" : "BrooksideCtZZZZZZZZZZ00000000000000000617ZZZZZCarlsbadCA92011",
                "SortOwnerName" : "BrooksideLandTrust,;State",
                "ListingIdAlphaNum" : "19443870",
                "IsListingOnly" : false
            }

该计数返回27,815个结果.我不认为这是索引问题,因为第一个匹配步骤执行得如此之快.我也不认为这是每个聚合管道步骤达到100mb内存限制的问题,因为我设置了allowDiskUse:false,但它仍在执行查询而不会出错.

The count returns 27,815 results. I don't see this as being an indexing issue, as the first matching step executes so fast. I also don't see this as being an issue with hitting the 100mb in memory limit per aggregation pipeline step, as I'm setting allowDiskUse: false and yet it's still executing the query without erroring.

同样有趣的是,针对相同集合的另一个聚合管道查询在第一步匹配之后会过滤掉多达45,081条记录,但是当我执行一次计数后,它仅在3秒钟内返回.因此,不能真正将此问题归咎于文档结构.

Also of interest, another aggregation pipeline query against the same collection filters down to 45,081 records after the first match step, and yet when I execute a count after that it returns in only 3 seconds. So the document structure can't really be blamed for this issue.

那么这里到底发生了什么?为什么匹配过滤如此之快,而之后的任何操作(甚至是简单的计数)却是如此之慢?我曾尝试启用解释:正确,我看不到任何突出之处.匹配操作表明它使用了正确的索引.计数操作在说明中不包含任何其他详细信息.

So what the heck is going on here? Why is the match filtering so fast and yet any operation after, even something as simple as a count, is so incredibly slow? I've tried enabling explain: true and I don't see anything that stands out there. The match operation shows that it's using the proper index. The count operation doesn't include any additional details in the explain.

推荐答案

2019年答案

此答案适用于MongoDB 4.2

在阅读了你们之间的问题和讨论之后,我相信问题已经解决,但对于所有使用MongoDB的人来说,优化仍然是一个常见问题.

After reading the question and the discussion between you guys, I believe that the issue is resolved but still optimization is a common problem for all who are using MongoDB.

我遇到了同样的问题,这是查询优化的技巧.

I faced the same problem, and here are the tips for query optimization.

如果我弄错了,请纠正我:)

Correct me if I'm wrong :)

1.在集合上添加索引

索引在快速运行查询中起着至关重要的作用,因为索引是一种数据结构,可以以易于遍历的形式存储集合的数据集.借助MongoDB中的索引可以有效地执行查询.

Indexes play a vital role in running queries quickly as Indexes are data structures that can store the collection’s data set in a form that is easy to traverse. Queries are efficiently executed with the help of indexes in MongoDB.

您可以根据需要创建其他类型的索引.在MonaDB官方文档的此处了解更多信息.

You can create a different type of indexes according to your need. Learn more about indexes here, the official MongoDB documentation.

2.管道优化

  • 始终在$ project之前使用$ match ,因为过滤器会从下一阶段删除多余的文档和字段.
  • 永远记住,索引由$ match和$ sort 使用.因此,请尝试在要对文档进行排序或过滤的字段上添加索引.
  • 尝试在查询中保留此顺序,在$ limit之前使用$ sort ,例如$ sort + $ limit + $ skip.因为$ sort利用了索引,并允许MongoDB在执行查询时选择所需的查询计划.
  • 总是在$ skip之前使用$ limit ,以便跳过将用于限制文档.
  • 在下一阶段使用 $ project 仅返回必要的数据.
  • 总是在$ lookup中的foreignField属性上创建一个索引.同样,当查找产生数组时,我们通常会在下一阶段将其展开.因此,与其在下一阶段展开它,不如在查找中展开它:

  • Always use $match before $project, as filters remove extra documents and fields from the next stage.
  • Always remember, indexes are used by $match and $sort. So, try to add an index to the fields on which you going to sort or filter documents.
  • Try to keep this sequence in your query, use $sort before $limit like $sort + $limit + $skip. Because $sort takes advantage of the index and allows MongoDB to select the required query plan while executing the query.
  • Always use $limit before $skip so that skip will be applied to limit Documents.
  • Use $project to return only the necessary data in the next stage.
  • Always create an index on the foreignField attributes in a $lookup. Also, as lookup produces an array, we generally unwind it in next stage. So, instead of unwinding it in next stage unwind it inside the lookup like:

{
$lookup: {
    from: "Collection",
    as: "resultingArrays",
    localField: "x",
    foreignField: "y",
    unwinding: { preserveNullAndEmptyArrays: false }

} }

在聚合中使用 allowDiskUse ,在聚合的帮助下,聚合操作可以将数据写入Database Path目录中的_tmp子目录.它用于对temp目录执行大型查询.例如:

Use allowDiskUse in aggregation, with the help of it aggregation operations can write data to the _tmp subdirectory in the Database Path directory. It is used to perform the large query on temp directory. For example:

 db.orders.aggregate(
 [
        { $match: { status: "A" } },
        { $group: { _id: "$uid", total: { $sum: 1 } } },
        { $sort: { total: -1 } }
 ],
 {
        allowDiskUse: true
 },
 )

3.重建索引

如果您经常创建和删除索引,请重新构建索引.相信我,它可以帮助MongoDB刷新先前存储在缓存中的查询计划,该缓存将继续接管所需的查询计划,相信我,这个问题很糟糕:(

If you are creating and deleting indexes quite often then rebuild your indexes. It helps MongoDB to refresh, the previously-stored query plan in, the cache, which keeps on taking over the required query plan, believe me, that issue sucks :(

4.删除不需要的索引

太多索引在创建,更新和删除操作中花费太多时间,因为它们需要连同其任务一起创建索引.因此,删除它们会很有帮助.

Too many indexes take too much time in Create, Update and Delete operation as they need to create index along with their tasks. So, remove them helps a lot.

5.限制文件

在实际情况下,获取数据库中存在的完整数据无济于事.另外,您将无法显示它,或者用户无法读取完整的提取数据.因此,而不是获取完整的数据,而是分块地获取数据,这有助于您和您的客户都观看该数据.

In a real-world scenario, fetching complete data present in the database does not help. Also, either you can't display it or the user can't read complete fetched data. So, instead of fetching complete data, fetch data in chunks which helps both you and your client watching that data.

最后,观察一下MongoDB选择了什么执行计划,有助于弄清主要问题. 因此, $ explain 将帮助您弄清这一点.

And lastly watching what execution plan is selected by MongoDB helps in figuring out the main issue. So, $explain will help you in figuring that out.

希望本摘要对您有帮助,如果我有任何遗漏,请随时提出新的建议.我也会添加它们.

Hope this summary will help you guys, feel free to suggest new points if I missed any. I will add them too.

这篇关于在第一个匹配步骤后,MongoDB聚合管道变慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆