弹性搜索性能相关的海量数据查询 [英] elastic search performance related queries for large volume of data

查看:61
本文介绍了弹性搜索性能相关的海量数据查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在开发一个良好的规模生产系统,在该系统中,我已将大量数据编入索引以进行弹性搜索.然后,我需要搜索特定的查询.这样做时,我遇到了一些与性能相关的查询.

I have been working on a good scale production system, where I have indexed large volume of data to elastic search. Then I need to search with specific queries. While doing so, I am having some performance related queries.

请将此视为以下问题的后续问题

Please consider this as a follow up question of this

  1. 由于我曾经使用内部命中返回嵌套数据,因此如果我们要返回大量嵌套对象,使用_source的文档不是最佳解决方案.那么,我们该如何克服呢?我们可以使用文档值字段吗?如果是,怎么办?

  1. Since I used to return the nested data using inner hits, from the documentation using _source is not a best solution if we have large set of nested objects to return. So how can we overcome this? Can we use doc value fields? If yes how?

请注意,默认情况下,内部匹配的大小默认为3,因此我们最多可以提供100.假设我们需要返回所有结果,如何在不影响性能的情况下获取数据?

Read that by default inner hits size defaults to size 3, so we can provide a max of 100. Suppose if we need to return all the results, how can we fetch data without affecting the performance?

推荐答案

Reg大小

您可以将大小指定为大,直到不超过默认值from + 10K的限制,即index.max_result_window www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html"rel =" nofollow noreferrer>索引模块文档,尽管您可以动态更改限制,但建议不要这样做在同一链接中,并且有更好的替代方法.

You can specify size as big till it doesn't cross the default limit of from+size of 10K which is know as index.max_result_window as specified in index module doc, although you can change the limit dynamically but its not recommended as mentioned in the same link and there are better alternatives to it.

更重要的是,您需要在inner_hits上定义大小,这更加昂贵,并且整个原因是ES将其限制为3,而在常规查询中,默认大小限制为10.

来到doc_values

您可以执行此操作,而不是从 _source 中获取值,只要您在默认启用了该字段的字段(例如 keyword 字段)上使用,而对于文本字段默认情况下未启用,您必须先启用它,它具有以下缺点:

Instead of fetching values from _source, you can do that as long as your are using on fields on which its enabled by default like keyword fields but for text fields its not enabled by default and you have to first enable it and it has below cons:

  1. 您需要更改索引映射并重新索引所有内容
  2. 这将在您的索引中占用更多空间.
  3. 这在文本字段上非常昂贵,这就是它被禁用和更多信息的原因.在此官方文档上
  4. 您已经在 _source 上有了此信息,由于性能原因,最好使用这些信息.
  1. You need to change the index mapping and reindex all content
  2. It will take more space in your index.
  3. Its very costly on text fields and that's the reason its disabled and more info on this official doc
  4. You already have this information on _source and it will be better to use that due to performance reasons.

这篇关于弹性搜索性能相关的海量数据查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆