Elasticsearch滚动 [英] Elasticsearch Scroll

查看:126
本文介绍了Elasticsearch滚动的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对Elasticsearch的滚动功能有点困惑. 在Elasticsearch中,是否每当用户滚动结果集时都可以调用搜索API? 从文档

I am little bit confused over Elasticsearch by its scroll functionality. In elasticsearch is it possible to call search API everytime whenever the user scrolls on the result set? From documentation

"search_type" => "scan",    // use search_type=scan
"scroll" => "30s",          // how long between scroll requests. should be small!
"size" => 50,               // how many results *per shard* you want back

这是否意味着它将每30秒执行一次搜索并返回所有结果集,直到没有记录为止?

Is that mean it will perform search for every 30 seconds and returns all the sets of results until there is no records?

例如,我的ES总共返回了500条记录.我从ES获得的数据是两组分别有250条记录的记录.有什么办法可以让我先显示第一组250条记录,当用户滚动时再显示第二组250条记录.请建议

For example my ES returns total 500 records. I am getting an data from ES as two sets of records each with 250 records. Is there any way I can display first set of 250 records first, when user scrolls then second set of 250 records.Please suggest

推荐答案

您正在寻找的是分页.

您可以通过查询固定大小并设置from参数来实现您的目标.由于要设置成250个结果的显示批次,因此可以设置size = 250,并在每个连续查询中将from的值增加250.

You can achieve your objective by querying for a fixed size and setting the from parameter. Since you want to set display in batches of 250 results, you can set size = 250 and with each consecutive query, increment the value of from by 250.

GET /_search?size=250                     ---- return first 250 results
GET /_search?size=250&from=250            ---- next 250 results 
GET /_search?size=250&from=500            ---- next 250 results

相反,Scan & scroll使您可以通过一次搜索来检索大量结果,并且理想地用于诸如将数据重新索引为新索引之类的操作.不建议将其用于实时显示搜索结果.

On the contrary, Scan & scroll lets you retrieve a large set of results with a single search and is ideally meant for operations like re-indexing data into a new index. Using it for displaying search results in real-time is not recommended.

为了简要解释Scan & scroll,它实际上所做的是扫描与扫描请求一起提供的查询的索引并返回scroll_id.可以将此scroll_id传递给下一个滚动请求,以返回下一批结果.

To explain Scan & scroll briefly, what it essentially does is that it scans the index for the query provided with the scan request and returns a scroll_id. This scroll_id can be passed to the next scroll request to return the next batch of results.

考虑以下示例-

# Initialize the scroll
page = es.search(
  index = 'yourIndex',
  doc_type = 'yourType',
  scroll = '2m',
  search_type = 'scan',
  size = 1000,
  body = {
    # Your query's body
    }
)
sid = page['_scroll_id']
scroll_size = page['hits']['total']

# Start scrolling
while (scroll_size > 0):
  print "Scrolling..."
  page = es.scroll(scroll_id = sid, scroll = '2m')
  # Update the scroll ID
  sid = page['_scroll_id']
  # Get the number of results that we returned in the last scroll
  scroll_size = len(page['hits']['hits'])
  print "scroll size: " + str(scroll_size)
  # Do something with the obtained page

在上面的示例中,发生了以下事件-

In above example, following events happen-

  • 滚动器已初始化.这将返回第一批结果以及scroll_id
  • 对于每个后续滚动请求,将发送更新的scroll_id(在上一个滚动请求中接收到),并返回下一批结果.
  • 滚动时间基本上是使搜索上下文保持活动状态的时间.如果未在设置的时间范围内发送下一个滚动请求,则搜索上下文将丢失并且结果将不会返回.这就是为什么不应将其用于包含大量文档的索引的实时结果显示的原因.
  • Scroller is initialized. This returns the first batch of results along with the scroll_id
  • For each subsequent scroll request, the updated scroll_id (received in the previous scroll request) is sent and next batch of results is returned.
  • Scroll time is basically the time for which the search context is kept alive. If the next scroll request is not sent within the set timeframe, the search context is lost and results will not be returned. This is why it should not be used for real-time results display for indexes with a huge number of docs.

这篇关于Elasticsearch滚动的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆