mongodb聚合$group阶段需要很长时间 [英] Mongodb aggregate $group stage takes a long time

查看：53 发布时间：2021/6/30 19:24:57 node.js mongodb aggregation-framework query-performance

本文介绍了mongodb聚合$group阶段需要很长时间的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在练习如何使用 MongoDB 聚合，但它们似乎需要很长时间(运行时间).

I'm practicing how to use MongoDB aggregation, but they seem to take a really long time (running time).

问题似乎在我使用 $group 时都会发生.所有其他查询都运行良好.

The problem seems to happen whenever I use $group. All other queries run just fine.

我有一些 1.3 百万个虚拟文档需要执行两个基本操作:获取计数IP 地址和唯一 IP 地址.

I have some 1.3 million dummy documents that need to perform two basic operations: get a count of the IP addresses and unique IP addresses.

我的架构看起来像这样:

{
    "_id":"5da51af103eb566faee6b8b4",
    "ip_address":"...",
    "country":"CL",
    "browser":{
        "user_agent":...",
    }
}

运行一个基本的 $group 查询平均需要 12 秒，这太慢了.

Running a basic $group query takes about 12s on average, which is much too slow.

我做了一些研究，有人建议在 ip_addresses 上创建一个索引.这似乎减慢了速度，因为查询现在需要 13-15 秒.

I did a little research, and someone suggested creating an index on ip_addresses. That seems to have slowed it down because queries now take 13-15s.

我使用 MongoDB，我正在运行的查询如下所示:

I use MongoDB and the query I'm running looks like this:

    visitorsModel.aggregate([
        {
            '$group': {
                '_id': '$ip_address',
                'count': {
                    '$sum': 1
                }
            }
        }
    ]).allowDiskUse(true)
        .exec(function (err, docs) {
            if (err) throw err;

            return res.send({
                uniqueCount: docs.length
            })
        })

感谢任何帮助.

编辑:我忘了说，有人说这可能是硬件问题?如果有帮助，我正在 i5、8GB RAM 的核心笔记本电脑上运行查询.

Edit: I forgot to mention, someone suggested it might be a hardware issue? I'm running the query on a core i5, 8GB RAM laptop if it helps.

编辑 2:查询计划:

{
    "stages" : [
        {
            "$cursor" : {
                "query" : {

                },
                "fields" : {
                    "ip_address" : 1,
                    "_id" : 0
                },
                "queryPlanner" : {
                    "plannerVersion" : 1,
                    "namespace" : "metrics.visitors",
                    "indexFilterSet" : false,
                    "parsedQuery" : {

                    },
                    "winningPlan" : {
                        "stage" : "COLLSCAN",
                        "direction" : "forward"
                    },
                    "rejectedPlans" : [ ]
                },
                "executionStats" : {
                    "executionSuccess" : true,
                    "nReturned" : 1387324,
                    "executionTimeMillis" : 7671,
                    "totalKeysExamined" : 0,
                    "totalDocsExamined" : 1387324,
                    "executionStages" : {
                        "stage" : "COLLSCAN",
                        "nReturned" : 1387324,
                        "executionTimeMillisEstimate" : 9,
                        "works" : 1387326,
                        "advanced" : 1387324,
                        "needTime" : 1,
                        "needYield" : 0,
                        "saveState" : 10930,
                        "restoreState" : 10930,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "direction" : "forward",
                        "docsExamined" : 1387324
                    }
                }
            }
        },
        {
            "$group" : {
                "_id" : "$ip_address",
                "count" : {
                    "$sum" : {
                        "$const" : 1
                    }
                }
            }
        }
    ],
    "ok" : 1
}

推荐答案

这是关于使用 $group 聚合阶段的一些信息，如果它使用索引，它的局限性以及可以尝试克服的问题这些.

This is some info about using $group aggregation stage, if it uses indexes, and its limitations and what can be tried to overcome these.

1.$group 阶段不使用索引:Mongodb 聚合:$group 是否使用索引?

2.$group 运算符和内存:

$group 阶段的 RAM 限制为 100 兆字节.默认情况下，如果阶段超过此限制，$group 返回错误.允许处理大型数据集时，将 allowDiskUse 选项设置为 true.此标志允许 $group 操作写入临时文件.

The $group stage has a limit of 100 megabytes of RAM. By default, if the stage exceeds this limit, $group returns an error. To allow for the handling of large datasets, set the allowDiskUse option to true. This flag enables $group operations to write to temporary files.

参见 MongoDb 文档 $group Operator 和内存

3.使用 $group 和 Count 的示例:

一个名为 cities 的集合:

{ "_id" : 1, "city" : "Bangalore", "country" : "India" }
{ "_id" : 2, "city" : "New York", "country" : "United States" }
{ "_id" : 3, "city" : "Canberra", "country" : "Australia" }
{ "_id" : 4, "city" : "Hyderabad", "country" : "India" }
{ "_id" : 5, "city" : "Chicago", "country" : "United States" }
{ "_id" : 6, "city" : "Amritsar", "country" : "India" }
{ "_id" : 7, "city" : "Ankara", "country" : "Turkey" }
{ "_id" : 8, "city" : "Sydney", "country" : "Australia" }
{ "_id" : 9, "city" : "Srinagar", "country" : "India" }
{ "_id" : 10, "city" : "San Francisco", "country" : "United States" }

查询按国家统计城市的集合:

Query the collection to count the cities by each country:

db.cities.aggregate( [
    { $group: { _id: "$country", cityCount: { $sum: 1 } } },
    { $project: { country: "$_id", _id: 0, cityCount: 1 } }
] )

结果:

{ "cityCount" : 3, "country" : "United States" }
{ "cityCount" : 1, "country" : "Turkey" }
{ "cityCount" : 2, "country" : "Australia" }
{ "cityCount" : 4, "country" : "India" }

4.使用 allowDiskUse 选项:

db.cities.aggregate( [
    { $group: { _id: "$country", cityCount: { $sum: 1 } } },
    { $project: { country: "$_id", _id: 0, cityCount: 1 } }
],  { allowDiskUse : true } )

注意，在这种情况下，它对查询性能或输出没有影响.这只是为了显示用法.

Note, in this case it makes no difference in query performance or output. This is to show the usage only.

<强>5.一些可尝试的选项(建议):

您可以尝试一些方法以获得一些结果(仅用于试验目的):

You can try a few things to get some result (for trial purposes only):

使用 $limit 阶段并限制处理的文档数量和看看结果如何.例如，您可以尝试 { $limit: 1000 }.注意这个阶段需要在 $group 阶段之前.
您还可以在 $group 之前使用 $match、$project 阶段stage 来控制输入的 shape 和 size.这可能返回结果(而不是错误).

Use $limit stage and restrict the number of documents processed and see what is the result. For example, you can try { $limit: 1000 }. Note this stage needs to come before the $group stage.
You can also use the $match, $project stages before the $group stage to control the shape and size of the input. This may return a result (instead of an error).

<小时>

Distinct 和 Count 的注意事项:

使用相同的 cities 集合 - 要获得独特的国家和数量，您可以尝试使用聚合阶段 $count 和 $group如以下两个查询.


Using the same cities collection - to get unique countries and a count of them you can try using the aggregate stage $count along with $group as in the following two queries.
独特:
db.cities.aggregate( [
   { $match: { country: { $exists: true } } },
   { $group: { _id: "$country" } },
   { $project: { country: "$_id", _id: 0 } }
] )

结果:
{ "country" : "United States" }
{ "country" : "Turkey" }
{ "country" : "India" }
{ "country" : "Australia" }

要将上述结果作为具有唯一值数组的单个文档，请使用 $addToSet 运算符:
To get the above result as a single document with an array of unique values, use the $addToSetoperator:
db.cities.aggregate( [
   { $match: { country: { $exists: true } } },
   { $group: { _id: null, uniqueCountries: { $addToSet:  "$country" } } },
   { $project: { _id: 0 } },
] )

结果:{ "uniqueCountries" : [ "United States", "Turkey", "India", "Australia" ] }
计数:
db.cities.aggregate( [
   { $match: { country: { $exists: true } } },
   { $group: { _id: "$country" } },
   { $project: { country: "$_id", _id: 0 } },
   { $count: "uniqueCountryCount" }
] )

结果:{ "uniqueCountryCount" : 4 }
在上述查询中，$match 阶段用于过滤任何具有不存在或空 country 字段的文档.$project 阶段重塑结果文档.
In the above queries the $match stage is used to filter any documents with non-existing or null countryfield. The $project stage reshapes the result document(s).
MongoDB 查询语言:
请注意，当使用 MongoDB 查询语言 命令时，这两个查询获得了相似的结果:db.collection.distinct("country") 和 db.cities.distinct("country").length(注意 distinct 返回一个数组).
Note the two queries get similar results when using the MongoDB query language commands: db.collection.distinct("country") and db.cities.distinct("country").length (note the distinct returns an array).

                        这篇关于mongodb聚合$group阶段需要很长时间的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

mongodb聚合$group阶段需要很长时间 [英] Mongodb aggregate $group stage takes a long time

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

mongodb聚合$group阶段需要很长时间 [英] Mongodb aggregate $group stage takes a long time

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭