如何在ElasticSearch中提高过滤器性能? [英] How to improve percolator performance in ElasticSearch?

查看:180
本文介绍了如何在ElasticSearch中提高过滤器性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

摘要

我们需要提高过滤器性能(吞吐量)。

We need to increase percolator performance (throughput).

最可能的方法是扩展到多个服务器。

Most likely approach is scaling out to multiple servers.

问题

如何可以进行横向扩展吗?

How to do scaling out right?

1)底层索引中分片数量的增加是否会允许并行运行更多的渗滤请求?

1) Would increasing number of shards in underlying index allow running more percolate requests in parallel?

2)如果仅执行渗滤,ElasticSearch服务器需要多少内存?

2) How much memory does ElasticSearch server need if it does percolation only?

最好有2个服务器和4GB RAM或1个服务器和16GB RAM ?

Is it better to have 2 servers with 4GB RAM or one server with 16GB RAM?

3)拥有SSD会有意义地提高过滤器的性能,还是增加RAM和/或节点数更好?

3) Would having SSD meaningfully help percolator's performance, or it is better to increase RAM and/or number of nodes?

我们的现状

我们的工作索引中有200,000个查询(工作搜索警报)。
我们能够运行4个并行的队列,这些队列调用渗滤器。
每个查询都能够在大约35秒内渗透批次50个作业,因此我们可以渗透:

We have 200,000 queries (job search alerts) in our job index. We are able to run 4 parallel queues that call percolator. Every query is able to percolate batch of 50 jobs in about 35 seconds, so we can percolate about:


4个队列*每批50个作业/ 35秒*每分钟60秒= 343
每分钟作业

4 queues * 50 jobs per batch / 35 seconds * 60 seconds in minute = 343 jobs per minute

我们需要更多。

我们的工作指数有4个碎片,我们在该工作指数之上使用.percolator。

Our jobs index have 4 shards and we are using .percolator sitting on top of that jobs index.

硬件: 2个处理器服务器,总共32个核心。 32GB RAM。
我们为ElasticSearch分配了8GB RAM。

Hardware: 2 processors server with 32 cores total. 32GB RAM. We allocated 8GB RAM to ElasticSearch.

当渗滤器工作时,我上面提到的4个渗滤队列消耗了大约50%的CPU。

When percolator is working, 4 percolation queues I mentioned above consume about 50% of CPU.

当我们尝试将并行渗透队列的数量从4增加到6时,CPU利用率跃升到75%以上。
更糟糕的是,渗滤器因NoShardAvailableActionException开始失败:

When we tried to increase number of parallel percolation queues from 4 to 6, CPU utilization jumped to 75%+. What is worse, percolator started to fail with NoShardAvailableActionException:


[2015-03-04 09:46:22,221] [ DEBUG] [action.percolate] [Cletus
Kasady] [工作] [3]分片多次渗透失败
org.elasticsearch.action.NoShardAvailableActionException:[jobs] [3]
null

[2015-03-04 09:46:22,221][DEBUG][action.percolate ] [Cletus Kasady] [jobs][3] Shard multi percolate failure org.elasticsearch.action.NoShardAvailableActionException: [jobs][3] null

该错误似乎表明我们应该增加分片的数量并最终添加专用的ElasticSearch服务器(+之后增加节点的数量)。

That error seems to suggest that we should increase number of shards and eventually add dedicated ElasticSearch server (+ later increase number of nodes).

相关:
如何优化elasticsearch渗滤器索引的内存性能

推荐答案

答案

如何正确进行横向扩展?

How to do scaling out right?

问: 1)是否会增加基础索引中的分片允许在中运行更多的渗滤请求

Q: 1) Would increasing number of shards in underlying index allow running more percolate requests in parallel?

A:否。分片仅在创建集群时非常有用。实际上,单个实例上的其他碎片可能会降低性能。通常,分片的数量应等于达到最佳性能的节点数量。

A: No. Sharding is only really useful when creating a cluster. Additional shards on a single instance may in fact worsen performance. In general the number of shards should equal the number of nodes for optimal performance.

Q: 2)如果需要,ElasticSearch服务器需要多少内存?

Q: 2) How much memory does ElasticSearch server need if it does percolation only?

最好有两台具有4GB RAM的服务器或一台具有16GB RAM的服务器吗?

Is it better to have 2 servers with 4GB RAM or one server with 16GB RAM?

A:渗透指数完全位于内存中,因此答案是很多。它完全取决于索引的大小。根据我的经验,20万次搜索将需要一个50MB的索引。在内存中,该索引将占用大约500MB的堆内存。因此,如果您只运行4 GB RAM,就足够了。我建议您使用更多的节点。但是,随着索引大小的增长,您将需要添加RAM。

A: Percolator Indices reside entirely in memory so the answer is A LOT. It is entirely dependent on the size of your index. In my experience 200 000 searches would require a 50MB index. In memory this index would occupy around 500MB of heap memory. Therefore 4 GB RAM should be enough if this is all you're running. I would suggest more nodes in your case. However as the size of your index grows, you will need to add RAM.

Q: 3)拥有SSD可以有效地帮助过滤器提高性能,还是增加RAM和/或节点数更好?

Q: 3) Would having SSD meaningfully help percolator's performance, or it is better to increase RAM and/or number of nodes?

A:我对此表示怀疑。正如我在渗滤器驻留在内存中之前所说的那样,磁盘性能并不是瓶颈。

A: I doubt it. As I said before percolators reside in memory so disk performance isn't much of a bottleneck.

编辑:这些内存估计。查看网站插件在主要的ES网站上。我发现 Big Desk 对于观察性能计数器以进行扩展和计划目的特别有用。

Don't take my word on those memory estimates. Check out the site plugins on the main ES site. I found Big Desk particularly helpful for watching performance counters for scaling and planning purposes. This should give you more valuable info on estimating your specific requirements.

编辑,以回应以下@DennisGorelik的评论:

我纯粹是从观察中获得这些数字,但经过反思才有意义。

I got those numbers purely from observation but on reflection they make sense.


  1. 20万磁盘上查询到50MB:该比率表示序列化到磁盘上时平均查询占用250个字节。

  2. 50MB索引到堆上500MB:而不是我们在内存中处理的磁盘对象上的序列化对象。考虑一下反序列化XML(或实际上是任何数据格式),通常会获得10倍大的内存对象。

  1. 200K Queries to 50MB on disk: This ratio means the average query occupies 250 bytes when serialized to disk.
  2. 50MB index to 500MB on heap: Rather than serialized objects on disk we are dealing with in memory Java objects. Think about deserializing XML (or any data format really) you generally get 10x larger in-memory objects.

这篇关于如何在ElasticSearch中提高过滤器性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆