Mongo with java - 使用batchsize查找查询 [英] Mongo with java - find query with batchsize

查看:31
本文介绍了Mongo with java - 使用batchsize查找查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 mongodb 中使用 java 在一个集合上执行 find 查询,batchsize 设置为 500.我的集合有 10,000 条记录,但设置了 batchsize 后我只得到 1-500 条记录.如何获取下一组记录?

I am executing find query in mongodb using java on a collection with batchsize set to 500. My collection has 10,000 records but with batchsize set i get only 1-500 records. How do I get the next set of records?

DBCursor cursor = collection.find(query).batchSize(batchSize);
while(cursor.hasNext()) {
    // write to file.
    DBObject obj = cursor.next();
    objectIdList.add(obj.get("_id"));
}

推荐答案

DBCursor 允许您遍历被认为与 query 相关的文档集以传入 find() 方法.它以 batchSize 为单位从底层数据库中延迟获取这些文档.

The DBCursor allows you to iterate over the set of documents which are deemed relevant to the query to passed into the find() method. It lazily fetches these documents from the underlying database in chunks of batchSize.

因此,使用默认的批量大小(101,IIRC),它会将前 101 个文档返回给您的客户端,然后当您的客户端代码迭代到第 101 个文档之后,它将(在幕后)抓取接下来的 101 个文档,等等直到以下哪个先发生:

So, with the default batch size (101, IIRC) it will return the first 101 documents to your client and then as your client code iterates beyond the 101st document it will (behind the scenes) grab the next 101 documents and so on until whichever of the following occurs first:

  • 返回与您的查询相关的所有文档,即游标已用尽
  • 您的客户停止迭代

当您设置显式 batchSize 时同样适用,因此在您设置 batchSize=500 的情况下,find() 调用返回DBCursor 包含(最多)500 个文档,如果有超过 500 个文档与您的查询匹配,那么当您迭代第 500 个文档之后,MongoDB Java 驱动程序将(在幕后)获取下一批.

The same applies when you set an explicit batchSize so in your case when you set batchSize=500, the find() call returns a DBCursor which contains (at most) 500 documents and if there were more than 500 documents matching your query then as you iterate beyond the 500th document the MongoDB Java driver would (behind the scenes) fetch the next batch.

你说...

我的收藏有 10,000 条记录,但设置了批量大小后,我只能得到 1-500 条记录

My collection has 10,000 records but with batchsize set i get only 1-500 records

...如果您只得到 500 个文档,那么您要么在 500 个之后停止迭代,要么只有 500 个文档被认为与您的 query 相关.

... if you only get 500 documents then either you stopped iterating after 500 or only 500 documents were deemed relevant to your query.

您可以使用 count() 方法查看与您的查询相关的文档数量.例如:

You can see how many documents are relevant to your query by using the count() method. For example:

int count = collection.find(query).count();

您还可以一次性获取与您的查询相关的所有文档,而无需像这样使用 DBCursor ...

You can also grab all of the documents relevant to your query in one go without using a DBCursor like this ...

List<DBObject> obj = collection.find(query).toArray();

...当然,这可能会对您的应用程序的堆产生影响,因为它会导致每个符合您标准的文档都存储在您的客户端的堆上(而不是通过批量读取它们的更内存友好的方法)DBCursor).

... though of course this might have implications for your application's heap since it would result in every document which meets your criteria being stored on-heap in your client (rather than the more memory friendly approach of reading them in batches via the DBCursor).

这篇关于Mongo with java - 使用batchsize查找查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆