Elasticsearch提高查询性能 [英] Elasticsearch improve query performance

查看:174
本文介绍了Elasticsearch提高查询性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力提高查询性能。简单查询平均需要约3秒,甚至不会触及嵌套文档,有时候会更长。

  curlhttp:// searchbox:9200 / global / user / _search?n = 0& sort =影响:asc& q = user.name:Bill%20Smith

即使没有排序,它需要几秒钟。以下是集群的详细信息:

  1.4TB索引大小。 
没有嵌套的210m文件(每个约10kb)
总共500万个文件。 (嵌套文档很小:2-5个字段)。
每个节点约128个段。
3个节点,m2.4xlarge(-Xmx设置为40g,机器内存为60g)
3分片。
索引是在亚马逊EBS卷。
复制0(尝试复制2只有很小的改进)

我没有看到CPU /内存中有任何明显的尖峰等。任何想法如何可以改善?

解决方案

Garry关于堆空间的观点是真实的,但是这可能不是堆空间。



使用您当前的配置,对于1.5 TB索引,您可以使用少于60GB的页面缓存。您的页面缓存中您的索引不到4.2%,您可能需要在大多数搜索中使用磁盘。



您可能想添加更多的内存到你的集群,你会想仔细考虑碎片的数量。只要坚持默认可能会导致偏斜的分布。在这种情况下,如果你有五个碎片,那么你将拥有两台机器,每台机器有40个数据,另外三台机器只有20%。在任何一种情况下,您都会等待最慢的机器或磁盘进行分布式搜索。关于生产中的弹性搜索的文章在确定权利方面有更深入的了解内存量。



对于这个确切的搜索示例,您可以使用过滤器。您正在排序,因此忽略了查询计算的分数。使用过滤器,它将在第一次运行后被缓存,随后的搜索将很快。


I'm trying to improve query performance. It takes an average of about 3 seconds for simple queries which don't even touch a nested document, and it's sometimes longer.

curl "http://searchbox:9200/global/user/_search?n=0&sort=influence:asc&q=user.name:Bill%20Smith"

Even without the sort it takes seconds. Here are the details of the cluster:

1.4TB index size.
210m documents that aren't nested (About 10kb each)
500m documents in total. (nested documents are small: 2-5 fields).
About 128 segments per node.
3 nodes, m2.4xlarge (-Xmx set to 40g, machine memory is 60g)
3 shards.
Index is on amazon EBS volumes.
Replication 0 (have tried replication 2 with only little improvement)

I don't see any noticeable spikes in CPU/memory etc. Any ideas how this could be improved?

解决方案

Garry's points about heap space are true, but it's probably not heap space that's the issue here.

With your current configuration, you'll have less than 60GB of page cache available, for a 1.5 TB index. With less than 4.2% of your index in page cache, there's a high probability you'll be needing to hit disk for most of your searches.

You probably want to add more memory to your cluster, and you'll want to think carefully about the number of shards as well. Just sticking to the default can cause skewed distribution. If you had five shards in this case, you'd have two machines with 40% of the data each, and a third with just 20%. In either case, you'll always be waiting for the slowest machine or disk when doing distributed searches. This article on Elasticsearch in Production goes a bit more in depth on determining the right amount of memory.

For this exact search example, you can probably use filters, though. You're sorting, thus ignoring the score calculated by the query. With a filter, it'll be cached after the first run, and subsequent searches will be quick.

这篇关于Elasticsearch提高查询性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆