查找聚合性能不佳 [英] Poor lookup aggregation performance
问题描述
我有两个收藏夹
帖子:
{
"_Id": "1",
"_PostTypeId": "1",
"_AcceptedAnswerId": "192",
"_CreationDate": "2012-02-08T20:02:48.790",
"_Score": "10",
...
"_OwnerUserId": "6",
...
},
...
和用户:
{
"_Id": "1",
"_Reputation": "101",
"_CreationDate": "2012-02-08T19:45:13.447",
"_DisplayName": "Geoff Dalgas",
...
"_AccountId": "2"
},
...
,我想找到撰写5到15个帖子的用户. 这是我的查询的样子:
and I want to find users who write between 5 and 15 posts. This is how my query looks like:
db.posts.aggregate([
{
$lookup: {
from: "users",
localField: "_OwnerUserId",
foreignField: "_AccountId",
as: "X"
}
},
{
$group: {
_id: "$X._AccountId",
posts: { $sum: 1 }
}
},
{
$match : {posts: {$gte: 5, $lte: 15}}
},
{
$sort: {posts: -1 }
},
{
$project : {posts: 1}
}
])
,它的运行速度很慢.对于6k用户和10k帖子,花了40秒钟以上才能获得响应,而在关系数据库中,我很快就获得了响应. 哪里出问题了?我刚开始使用mongodb,很可能我弄乱了这个查询.
and it works terrible slow. For 6k users and 10k posts it tooks over 40 seconds to get response while in relational database I get response in a split second. Where's the problem? I'm just getting started with mongodb and it's quite possible that I messed up this query.
推荐答案
来自 https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/
foreignField在from中的文档中指定字段 收藏. $ lookup在foreignField上执行平等匹配,以 输入文档中的localField.如果文档来自 集合不包含foreignField,$ lookup对待 值作为null进行匹配.
foreignField Specifies the field from the documents in the from collection. $lookup performs an equality match on the foreignField to the localField from the input documents. If a document in the from collection does not contain the foreignField, the $lookup treats the value as null for matching purposes.
这将与其他任何查询一样执行.
This will be performed the same as any other query.
如果您在_AccountId字段上没有索引,它将对10,000个帖子中的每一个进行全表扫描查询.大部分时间将花费在该表扫描中.
If you don't have an index on the field _AccountId, it will do a full tablescan query for each one of the 10,000 posts. The bulk of the time will be spent in that tablescan.
db.users.ensureIndex("_AccountId", 1)
加快了处理速度,因此它可以完成10,000次索引命中,而不是10,000次表扫描.
speeds up the process so it's doing 10,000 index hits instead of 10,000 table scans.
这篇关于查找聚合性能不佳的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!