如何使用MongoDB有效地分页结果 [英] How to efficiently page batches of results with MongoDB

查看:92
本文介绍了如何使用MongoDB有效地分页结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对我的MongoDB集合使用以下查询,这需要一个多小时才能完成.

I am using the below query on my MongoDB collection which is taking more than an hour to complete.

db.collection.find({language:"hi"}).sort({_id:-1}).skip(5000).limit(1)

我正在尝试获取5000批结果,以便在语言字段中以"hi"为值的文档以升序或降序处理.因此,我正在使用此查询,其中每次通过增加"skip"值来跳过已处理的文档.

I am trying to to get the results in a batch of 5000 to process in either ascending or descending order for documents with "hi" as a value in language field. So i am using this query in which i am skipping the processed documents every time by incrementing the "skip" value.

此收藏集中的文档数刚好超过2000万. 已经在语言"字段上创建了索引. 我正在使用的MongoDB版本是2.6.7

The document count in this collection is just above 20 million. An index on the field "language" is already created. MongoDB Version i am using is 2.6.7

此查询是否有更合适的索引,可以更快地获得结果?

Is there a more appropriate index for this query which can get the result faster?

推荐答案

为了以所需的方式有效地分页"结果,最好使用范围查询"并保留最后处理的值

In order to efficiently "page" through results in the way that you want, it is better to use a "range query" and keep the last value you processed.

您希望此处的排序键"是_id,这样使事情变得简单:

You desired "sort key" here is _id, so that makes things simple:

首先,您希望您的索引按正确的顺序完成,该顺序通过 .createIndex() (不是不推荐使用的方法):

First you want your index in the correct order which is done with .createIndex() which is not the deprecated method:

db.collection.createIndex({ "language": 1, "_id": -1 })

然后,您要从一开始就进行一些简单的处理:

Then you want to do some simple processing, from the start:

var lastId = null;

var cursor = db.collection.find({language:"hi"});
cursor.sort({_id:-1}).limit(5000).forEach(funtion(doc) {
    // do something with your document. But always set the next line
    lastId = doc._id;
})

那是第一批.现在,当您转到下一个时:

That's the first batch. Now when you move on to the next one:

var cursor = db.collection.find({ "language":"hi", "_id": { "$lt": lastId });
cursor.sort({_id:-1}).limit(5000).forEach(funtion(doc) {
    // do something with your document. But always set the next line
    lastId = doc._id;
})

因此,在进行选择时始终考虑lastId值.您将其存储在每个批次之间,然后从最后一个批次继续.

So that the lastId value is always considered when making the selection. You store this between each batch, and continue on from the last one.

这比使用.skip()进行处理要高效得多,无论使用什么索引,"c3"都将仍然"需要跳过"集合中的所有数据,直至达到跳过点.

That is much more efficient than processing with .skip(), which regardless of the index will "still" need to "skip" through all data in the collection up to the skip point.

在此处使用$lt运算符可以过滤"所有已处理的结果,因此可以更快地进行处理.

Using the $lt operator here "filters" all the results you already processed, so you can move along much more quickly.

这篇关于如何使用MongoDB有效地分页结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆