在Elasticsearch扫描和滚动中,是否可以控制批量大小和限制搜索中的文档数? [英] In Elasticsearch scan-and-scroll, is there a way to control both the batch size and limit the number of documents in the search?
问题描述
使用Elasticsearch扫描和滚动功能,是否可以控制返回的批次的大小以及匹配数量的限制?
Using the Elasticsearch scan-and-scroll feature, is it possible to control both the size of the batches returned, as well as the limit on the number of matches?
尽管我们指定的
size
为1,000,但我们还是得到了更多文档.扫描时,将size
应用于每个分片,因此每批您最多可以获取size * number_of_primary_shards
个文档.
Although we specified a
size
of 1,000, we get back many more documents. When scanning, thesize
is applied to each shard, so you will get back a maximum ofsize * number_of_primary_shards
documents in each batch.
这似乎表明 size
参数在扫描和滚动"中的用法不同,而在 query-then-fetch
-type中使用(限制匹配数的位置),并且没有可以指定的单独旋钮".
This seems to indicate that the size
parameter is used differently in a scan-and-scroll then it would be used in a query-then-fetch
-type (where it limits the number of matches), and that there is not a "separate knob" that can be specified.
更新
一个用例是:
- 我有很多索引(每个索引有2个分片).
- 出于某些我无法更改的良好原因,它们每天都组织起来.
扫描滚动似乎是一个不错的选择,但是也许有更好的方法可以做到这一点?
Scan-and-scroll seems like a good choice, but perhaps there's a better way to do this?
推荐答案
size
在扫描和滚动中的用法不同.它确实限制了每次滚动返回的文档数量,但是您会得到size * num_of_primary_shards
条.size
is used differently in scan and scroll. It does limit the number of documents return with each scroll, but you getsize * num_of_primary_shards
back.通常来说,您是正确的,但是您可以使用限制过滤器(或2.0中的限制查询)限制返回的匹配数-似乎有些奇怪,如果能对此进行限制,我将确保扫描和滚动是最好的方法方式是所需的行为.
In general you are correct but you could limit the hits returned using a limit filter (or limit query in 2.0) - seems a little odd though, I'd make sure scan and scroll is the best approach if limiting it in this way is the desired behavior.
这篇关于在Elasticsearch扫描和滚动中,是否可以控制批量大小和限制搜索中的文档数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!