如何优化弹性搜索的索引? [英] How to optimize indexation on elasticsearch?

查看:171
本文介绍了如何优化弹性搜索的索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力了解如何在弹性搜索上优化索引。让我澄清我的需要;

I am trying to understand how indexing can be optimized on elasticsearch. Let me clarify my needs;


  • 我现在有两个索引。让我们说, indexA indexB (两个索引可以看到大致相同的大小)

  • 我有6台专用于弹性搜索的机器(我们可以说完全相同的硬件)
  • 我的弹性搜索使用的最重要的部分是写作,因为我正在做很多的写实。

  • I have two indices rigth now. Lets say, indexA and indexB ( Two indices can be seen approximately same size)
  • I have 6 machines dedicated to elasticsearch (we can say exactly the same hardware)
  • The most important part of my elasticsearch usage is on writing since I am doing heavy writing on real time.

所以我的问题是,我如何优化使用这6台机器的写作操作?

So my question is, how I can I optimize the writing operation using those 6 machines ?


  • 我应该将机器分为两部分,如3台机器, indexA 和3台机器 indexB

我应该按顺序使用所有6台机器索引 indexA indexB

Should I use all of 6 machines in order to index indexA and indexB ?

为了优化写入操作,还需要注意什么?

What else should I need to give attention in order to optimize write operations ?

提前谢谢你

推荐答案

这取决于,但是让我按照你的问题陈述采取一个方向,导致以下假设:

It depends, but let me take to a direction as per your problem statement which led to following assumptions:


  • 你想做更多的写作操作(不用担心搜索的表现)

  • 这两个索引在同一个集群中

  • 在将来更多的系统可以添加

为了更好的索引性能,首先,您可能希望为索引使用单个分片(除非您使用路由)。但是,由于您有6个服务器具有单个分片将浪费资源,因此您可以为indexA和indexB中的每一个分配3个分片。这是针对当前的情况,但建议使用〜10个分片(为了将来的可扩展性和数据大小而定)

For better indexing performance first thing is you may want to have single shard for your index (unless you are using routing). But since you have 6 servers having single shard will be waste of resources so you can assign 3 shard to each of indexA and indexB. This is for current scenario but it is recommended to have ~10 shards(for future scalibility and your data size dependent)

关闭副本(如果可能的话,索引请求等待对于副本在返回之前做出回应)。尽管在生产环境中,强烈建议至少有一个副本用于高可用性。

Turn off the replica (if possible as index requests wait for the replicas to respond before returning). Though in production environment it is highly recommended to have at least one replica for high availability.

将刷新率设置为-1或至少将更大的数字设置为 30米。 (如果您这样做,您将失去NRT搜索,但正如您提到的那样,您担心索引)

Set refresh rate to "-1" or at least to a larger figure say "30m". (You will lose NRT search if you do so but as you have mentioned you are concerned about indexing)

转到索引加热器如果你有任何的。

避免为您的字段映射使用doc_values。 (尽管在搜索时间内减少内存占用是有益的,它会在编制索引期间的字段值时增加索引时间)

avoid using "doc_values" for your field mapping. (though it is beneficial for reducing memory footprint during search time it will increase your index time as it prepares field values during indexing)

如果可能/不需要禁用规范在您的映射中

If possible/not required disable "norms" in your mapping

最后阅读这个

谨慎的注意事项:上述方法会影响您的搜索表现。

Word of caution: some of the approach above will impact your search performance.

这篇关于如何优化弹性搜索的索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆