查找聚合性能不佳 [英] Poor lookup aggregation performance

查看:61
本文介绍了查找聚合性能不佳的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个收藏夹

帖子:

{
    "_Id": "1",
    "_PostTypeId": "1",
    "_AcceptedAnswerId": "192",
    "_CreationDate": "2012-02-08T20:02:48.790",
    "_Score": "10",
    ...
    "_OwnerUserId": "6",
    ...
},
...

和用户:

{
    "_Id": "1",
    "_Reputation": "101",
    "_CreationDate": "2012-02-08T19:45:13.447",
    "_DisplayName": "Geoff Dalgas",
    ...
    "_AccountId": "2"
},
...

,我想找到撰写5到15个帖子的用户. 这是我的查询的样子:

and I want to find users who write between 5 and 15 posts. This is how my query looks like:

db.posts.aggregate([
    {
        $lookup: {
            from: "users", 
            localField: "_OwnerUserId",
            foreignField: "_AccountId", 
            as: "X"
        }
    },  
    {
        $group: {
            _id: "$X._AccountId", 
            posts: { $sum: 1 }
        }
    },   
    {
        $match : {posts: {$gte: 5, $lte: 15}}
    },  
    {
        $sort: {posts: -1 }
    },
    {
        $project : {posts: 1}
    }
])

,它的运行速度很慢.对于6k用户和10k帖子,花了40秒钟以上才能获得响应,而在关系数据库中,我很快就获得了响应. 哪里出问题了?我刚开始使用mongodb,很可能我弄乱了这个查询.

and it works terrible slow. For 6k users and 10k posts it tooks over 40 seconds to get response while in relational database I get response in a split second. Where's the problem? I'm just getting started with mongodb and it's quite possible that I messed up this query.

推荐答案

来自 https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/

foreignField在from中的文档中指定字段 收藏. $ lookup在foreignField上执行平等匹配,以 输入文档中的localField.如果文档来自 集合不包含foreignField,$ lookup对待 值作为null进行匹配.

foreignField Specifies the field from the documents in the from collection. $lookup performs an equality match on the foreignField to the localField from the input documents. If a document in the from collection does not contain the foreignField, the $lookup treats the value as null for matching purposes.

这将与其他任何查询一样执行.

This will be performed the same as any other query.

如果您在_AccountId字段上没有索引,它将对10,000个帖子中的每一个进行全表扫描查询.大部分时间将花费在该表扫描中.

If you don't have an index on the field _AccountId, it will do a full tablescan query for each one of the 10,000 posts. The bulk of the time will be spent in that tablescan.

db.users.ensureIndex("_AccountId", 1) 

加快了处理速度,因此它可以完成10,000次索引命中,而不是10,000次表扫描.

speeds up the process so it's doing 10,000 index hits instead of 10,000 table scans.

这篇关于查找聚合性能不佳的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆