弹性搜索-滚动行为 [英] Elastic Search - Scroll behavior

查看:51
本文介绍了弹性搜索-滚动行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了至少两种可能的方式来批量获取结果.

I come across at least two possible ways to fetch the results in batches .

  1. 滚动API

  1. Scroll API

分页-从,尺寸参数

基本区别是什么?我假设#1允许滚动记录,而#2允许您一次提取一批记录.如果我只是使用不同的From,Size参数来驱动分页,那么是否有可能以相同的批次返回相同的记录?

What is the fundamental difference ? I am assuming #1 allows to scroll over the records while #2 allows you to fetch a batch of records at a time . If i just use different From , Size parameters to drive pagination, are there chances where the same record will be returned in different batches?

推荐答案

使用提高该限制,但不建议这样做,因为深度分页会降低群集的性能.

Using from/size is the default and easiest way to paginate results. By default, it only works up to a size of 10000. You can increase that limit, but it is not advised to go too far because deep pagination will decrease the performance of your cluster.

滚动API 将使您可以对所有数据进行分页.它的工作方式是通过创建搜索上下文(即开始滚动时的数据快照),然后您将获得一个游标以对所有数据进行分页.完成后,您可以关闭搜索上下文.创建的搜索上下文具有相关的成本(需要状态,因此需要内存),因此这种分页方式不适合实时分页(对于批量式分页而言更适用).

The scroll API will allow you to paginate over all your data. The way it works is by creating a search context (i.e. a snapshot of the data at the time your start scrolling) and then you'll get a cursor to paginate over all your data. When done, you can close the search context. The created search context has an associated cost (requires state, hence memory), hence this way of paginating is not suited to real-time pagination (more for batch-like pagination).

还有另一种滚动所有数据的方法,而无需每次都创建专用的搜索上下文而需要支付额外费用,这种方法称为

There is another way of scrolling over all the data without the additional cost of creating a dedicated search context every time, and it's called search_after. In this flavor, the idea is to sort your data, and then use the sort values as lightweight cursors. It can have some drawbacks, for instance, if you're constantly indexing new data, you might run the risk of missing new data that would have appeared on a previous "page".

在7.10版中,将有另一种分页数据的方式,称为

In 7.10, there is going to be yet another way of paginating data, which is called Point in Time search (PIT). Here the idea is again to create a context so that you can return hits as rapidly as possible and aggregations (a bit later) in two distinct calls.

更新

7.10已于2020年11月11日发布,并且

7.10 got released on Nov 11th, 2020, and Point in Time searches are now available, too.

这篇关于弹性搜索-滚动行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆