弹性搜索滚动 [英] Elasticsearch Scroll

查看:23
本文介绍了弹性搜索滚动的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 Elasticsearch 的滚动功能有点困惑.在 elasticsearch 中,每次用户在结果集上滚动时是否都可以调用搜索 API?来自文档

I am little bit confused over Elasticsearch by its scroll functionality. In elasticsearch is it possible to call search API everytime whenever the user scrolls on the result set? From documentation

"search_type" => "scan",    // use search_type=scan
"scroll" => "30s",          // how long between scroll requests. should be small!
"size" => 50,               // how many results *per shard* you want back

这是否意味着它会每 30 秒执行一次搜索并返回所有结果集,直到没有记录?

Is that mean it will perform search for every 30 seconds and returns all the sets of results until there is no records?

例如我的 ES 返回总共 500 条记录.我从 ES 获取数据作为两组记录,每组记录 250 条.有什么办法可以先显示第一组 250 条记录,当用户滚动时再显示第二组 250 条记录.请建议

For example my ES returns total 500 records. I am getting an data from ES as two sets of records each with 250 records. Is there any way I can display first set of 250 records first, when user scrolls then second set of 250 records.Please suggest

推荐答案

您正在寻找的是分页.

您可以通过查询固定大小并设置 from 参数来实现您的目标.由于要设置成批显示250条结果,可以设置size = 250,每次连续查询时,将from的值增加250.

You can achieve your objective by querying for a fixed size and setting the from parameter. Since you want to set display in batches of 250 results, you can set size = 250 and with each consecutive query, increment the value of from by 250.

GET /_search?size=250                     ---- return first 250 results
GET /_search?size=250&from=250            ---- next 250 results 
GET /_search?size=250&from=500            ---- next 250 results

相反,Scan &scroll 允许您通过一次搜索检索大量结果,非常适合将数据重新索引到新索引等操作.不推荐使用它来实时显示搜索结果.

On the contrary, Scan & scroll lets you retrieve a large set of results with a single search and is ideally meant for operations like re-indexing data into a new index. Using it for displaying search results in real-time is not recommended.

解释Scan &scroll 简而言之,它的本质是扫描随扫描请求提供的查询的索引,并返回一个 scroll_id.这个scroll_id可以传递给下一个滚动请求,返回下一批结果.

To explain Scan & scroll briefly, what it essentially does is that it scans the index for the query provided with the scan request and returns a scroll_id. This scroll_id can be passed to the next scroll request to return the next batch of results.

考虑下面的例子-

# Initialize the scroll
page = es.search(
  index = 'yourIndex',
  doc_type = 'yourType',
  scroll = '2m',
  search_type = 'scan',
  size = 1000,
  body = {
    # Your query's body
    }
)
sid = page['_scroll_id']
scroll_size = page['hits']['total']

# Start scrolling
while (scroll_size > 0):
  print "Scrolling..."
  page = es.scroll(scroll_id = sid, scroll = '2m')
  # Update the scroll ID
  sid = page['_scroll_id']
  # Get the number of results that we returned in the last scroll
  scroll_size = len(page['hits']['hits'])
  print "scroll size: " + str(scroll_size)
  # Do something with the obtained page

在上面的例子中,发生了以下事件-

In above example, following events happen-

  • Scroller 已初始化.这将返回第一批结果以及 scroll_id
  • 对于每个后续滚动请求,发送更新的 scroll_id(在前一个滚动请求中接收)并返回下一批结果.
  • 滚动时间基本上是搜索上下文保持活动的时间.如果在设置的时间范围内未发送下一个滚动请求,则搜索上下文将丢失并且不会返回结果.这就是为什么它不应该用于具有大量文档的索引的实时结果显示.
  • Scroller is initialized. This returns the first batch of results along with the scroll_id
  • For each subsequent scroll request, the updated scroll_id (received in the previous scroll request) is sent and next batch of results is returned.
  • Scroll time is basically the time for which the search context is kept alive. If the next scroll request is not sent within the set timeframe, the search context is lost and results will not be returned. This is why it should not be used for real-time results display for indexes with a huge number of docs.

这篇关于弹性搜索滚动的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆