ElasticSearch-分片如何影响索引性能? [英] ElasticSearch - How does sharding affect indexing performance?

查看:118
本文介绍了ElasticSearch-分片如何影响索引性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对ElasticSearch的单节点集群进行一些基准测试。

I'm doing some benchmarks on a single-node cluster of ElasticSearch.

我面临着这样的情况:更多分片会降低
的索引性能-至少在单个节点中-(在延迟和吞吐量方面)

I faced to the situation that more shards will reduce the indexing performance -at least in a single node- (both in latency and throughput)

这些是我的一些数字:


  • 具有1个分片的索引,它为每分钟索引+ 6K文档

  • 具有5个分片的索引,它为每分钟+ 3K文档索引

  • 使用20个分片进行索引,它每分钟索引+ 1K文档

使用批量API的结果相同。因此,我想知道这是什么关系,为什么会发生这种情况?

I had the same results with bulk API. So I'm wondering what's the relation and why this happens?

注意:我没有资源问题!资源是免费的(CPU和内存)

Note: I don't have the resource problem! Resources are free (CPU & Memory)

推荐答案

只需要让您进入同一页面即可:

您的数据按索引组织,每个索引由分片组成,并分布在多个节点上。如果需要为新文档建立索引,则将生成一个新的ID,并根据该ID计算目标分片。之后,将写操作委派给该节点,该节点保存了计算出的目标分片。这样可以将文档很好地分布在所有分片上。

Your data is organized in indices, each made of shards and distributed across multiple nodes. If a new document needs to be indexed, a new id is being generated and the destination shard is being calculated based on this id. After that, the write is delegated to the node, which is holding the calculated destination shard. This will distribute your documents pretty well across all of your shards.

通过id查找文档现在很容易,因为包含所需文档的分片可以仅基于id进行计算。无需搜索所有碎片。顺便说一句,这就是您之后无法更改分片数量的原因。更改后的分片编号将导致各个分片上的文档分布不同。

Finding documents by id is now easy, as the shard, containing the wanted document, can be calulated just based on the id. There is no need for searching all shards. BTW, that's the reason why you can't change the number of shards afterwards. The changed shard number will result in a different document distribution across your shards.

现在,为了清楚起见,每个分片都是一个单独的Lucene索引,由段文件组成位于磁盘上。编写时,将创建新的段。如果将达到特定数量的段文件,则将合并这些段。
因此,仅引入更多的分片而不将它们分配给其他节点,只会为单个节点引入更高的I / O和内存消耗。
搜索时,将对每个分片执行查询。之后,所有分片的结果都需要合并为一个结果-更多分片,需要执行更多的cpu工作...

Now, just to make it clear, each shard is a separate lucene index, made of segment files located on your disk. When writing, new segments will be created. If a particular number of segment files will be reached, the segments will be merged. So just introducing more shards without distributing them to other nodes will just introduce a higher I/O and memory consumption for your single node. While searching, the query will be executed against each shard. Afterwards the results of all shards needs to be merged into one result - more shards, more cpu work to do...

回到您的问题:

对于您的写重索引情况,只有一个节点,索引和分片的最佳数量是1!
但是对于搜索情况(不按ID进行访问),每个节点的最佳分片数是可用的CPU数。这样,可以在多个线程中进行搜索,从而提高搜索性能。

For your write heavy indexing case, with just one node, the optimal number of indices and shards is 1! But for the search case (not accessing by id), the optimal number of shards per node is the number of CPUs available. In such a way, searching can be done in multiple threads, resulting in better search performance.

但是分片的好处是什么?

But what are the benefits of sharding?


  1. 可用性:通过将分片复制到其他节点,如果某些节点不再可用,您仍然可以使用!

  1. Availability: By replicating the shards to other nodes you can still serve if some of your nodes can´t be reached anymore!

性能:将主分片分发到不同的节点,也将分配工作量。

Performance: Distibuting the primary shards to different nodes, will distribute the workload too.

因此,如果您的方案写得很重,请使每个索引的分片数量保持较低。如果需要更好的搜索性能,请增加分片的数量,但要牢记物理。如果需要可靠性,请考虑节​​点/副本的数量。

So if your scenario is write heavy, keep the number of shards per index low. If you need better search performance, increase the number of shards, but keep the "physics" in mind. If you need reliability, take the number of nodes/replicas into account.

更多读数:

https://www.elastic.co/guide/zh-CN/elasticsearch/reference /current/_basic_concepts.html

https://www.elastic.co/guide/zh-CN/elasticsearch/reference/current/tune-for-indexing-speed.html

https://www.elastic.co/guide/zh-CN/elasticsearch/reference/current/tune-for-search-speed.html

https: //www.elastic.co/de/blog/how-many-shards-should-i-have-in-my-elast icsearch-cluster

https://thoughts.t37.net/designing-the-perfect-elasticsearch-cluster-the-almost-definitive-guide-e614eabc1a87

这篇关于ElasticSearch-分片如何影响索引性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆