弹性搜索 - 滚动行为 [英] Elastic Search - Scroll behavior

查看:22
本文介绍了弹性搜索 - 滚动行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到过至少两种可能的批量获取结果的方法.

I come across at least two possible ways to fetch the results in batches .

  1. 滚动 API

  1. Scroll API

分页 - From , Size 参数

Pagination - From , Size parameters

根本区别是什么?我假设 #1 允许滚动记录,而 #2 允许您一次获取一批记录.如果我只是使用不同的 From , Size 参数来驱动分页,是否有可能在不同批次中返回相同的记录?

What is the fundamental difference ? I am assuming #1 allows to scroll over the records while #2 allows you to fetch a batch of records at a time . If i just use different From , Size parameters to drive pagination, are there chances where the same record will be returned in different batches?

推荐答案

使用 from/size 是对结果进行分页的默认和最简单的方法.默认情况下,它最多只能工作到 10000 的大小.您可以 增加该限制,但不建议走得太远,因为深度分页会降低集群的性能.

Using from/size is the default and easiest way to paginate results. By default, it only works up to a size of 10000. You can increase that limit, but it is not advised to go too far because deep pagination will decrease the performance of your cluster.

scroll API 将允许您对所有数据进行分页.它的工作方式是创建一个搜索上下文(即开始滚动时的数据快照),然后您将获得一个光标来对所有数据进行分页.完成后,您可以关闭搜索上下文.创建的搜索上下文具有相关的成本(需要状态,因此需要内存),因此这种分页方式不适合实时分页(更多用于类似批处理的分页).

The scroll API will allow you to paginate over all your data. The way it works is by creating a search context (i.e. a snapshot of the data at the time your start scrolling) and then you'll get a cursor to paginate over all your data. When done, you can close the search context. The created search context has an associated cost (requires state, hence memory), hence this way of paginating is not suited to real-time pagination (more for batch-like pagination).

还有另一种滚动所有数据的方法,无需每次都创建专用搜索上下文的额外成本,它称为 search_after.在这种风格中,想法是对数据进行排序,然后将排序值用作轻量级游标.它可能有一些缺点,例如,如果您不断地索引新数据,您可能会冒着丢失出现在前一个页面"上的新数据的风险.

There is another way of scrolling over all the data without the additional cost of creating a dedicated search context every time, and it's called search_after. In this flavor, the idea is to sort your data, and then use the sort values as lightweight cursors. It can have some drawbacks, for instance, if you're constantly indexing new data, you might run the risk of missing new data that would have appeared on a previous "page".

在 7.10 中,将有另一种数据分页方式,称为 时间点搜索 (PIT).这里的想法再次是创建一个上下文,以便您可以在两个不同的调用中尽快返回命中和聚合(稍后).

In 7.10, there is going to be yet another way of paginating data, which is called Point in Time search (PIT). Here the idea is again to create a context so that you can return hits as rapidly as possible and aggregations (a bit later) in two distinct calls.

更新

7.10 于 2020 年 11 月 11 日发布,时间点搜索现在也可用.

7.10 got released on Nov 11th, 2020, and Point in Time searches are now available, too.

这篇关于弹性搜索 - 滚动行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆