如何优化弹性搜索渗滤器索引内存性能 [英] How to Optimize elasticsearch percolator index Memory Performance

查看:129
本文介绍了如何优化弹性搜索渗滤器索引内存性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我已经为我的渗滤器创建了一个单独的索引。我有大约1万000个用户创建的保存的搜索(用于电子邮件警报)。创建此渗透标签索引后,我的堆使用率将突破100%,服务器对任何查询都无响应。我有一些有限的资源,不能简单地把更多的RAM放在这个问题上。唯一的解决方法是删除包含我保存的搜索的索引。



从我已经读取的内容库索引驻留在内存中永久。这是完全必要的吗?是否有一种方法来抑制这种行为,但仍然保留功能?有没有办法优化我的数据/查询/索引结构以避免这种行为,同时仍然实现所需的结果?

解决方案

从弹性研究的角度来看,这个问题没有解决这个问题,也不是一个可能。我已经直接和ElasticSearch家人聊天了,他们的回答是:把更多的硬件放在这里。



然而,我找到了一种方法来解决这个问题,我使用这个功能。当我分析我保存的搜索数据时,我发现我的搜索包括大约100 000个唯一关键词搜索以及各种过滤排列,创建超过1 000 000个保存的搜索。



如果我看看他们的过滤器,如:




  • 位置 - 300+

  • 行业 - 50+

  • 等...



给出解决方案空间:


100 000 *> 300 *> 50 * ...〜=> 1 500 000 000


但是,如果我要在搜索引擎中分解搜索和索引关键字搜索和过滤器,那么
我的搜索结果会少得多:


100 000 +> 300 +> 50 + ...〜=> 100 350


这些搜索本身比原始搜索更小,更不复杂。



现在我创建了第二个(非渗透者)索引,列出所有1 000 000个保存的海在
percolator索引中包含搜索组件的ids。



然后我渗透一个文档,然后再对关键字进行搜索过滤第二个查询并过滤渗滤器结果。
我甚至可以保留相关性分数,这纯粹是从关键字搜索中返回的。



这种方法将显着减少我的渗透索引内存占用服务于同样的目的。



我想邀请有关这种方法的反馈(我还没有尝试过,但我会保持你的发布)。



同样,如果我的方法成功,你认为这是值得的功能要求?


Is there a way to improve memory performance when using an elasticsearch percolator index?

I have created a separate index for my percolator. I have roughly 1 000 000 user created saved searches (for email alerts). After creating this percolator index, my heap usage spikes to 100% and the server became unresponsive for any queries. I have somewhat limited resources and am not able to simply throw more RAM at the problem. The only solution was to delete the index containing my saved searches.

From what I have read the percolator index resides in-memory permanently. Is this entirely necessary? Is there a way to throttle this behaviour but still preserve the functionality? Is there a way to optimize my data/queries/index structure to circumvent this behaviour while still achieving the desired result?

解决方案

There is no resolution to this issue from an ElasticSearch point of view nor is one likely. I have chatted to the ElasticSearch guys directly and their answer is: "throw more hardware at it".

I have however found a way to solve this problem in terms of mitigating my usage of this feature. When I analyzed my saved search data I discovered that my searches consisted of around 100 000 unique keyword searches along with various filter permutations creating over 1 000 000 saved searches.

If I look at the filters they are things like:

  • Location - 300+
  • Industry - 50+
  • etc...

Giving a solution space of:

100 000 * >300 * >50 * ... ~= > 1 500 000 000

However if I were to decompose the searches and index the keyword searches and filters separately in the percolator index, I end up with far less searches:

100 000 + >300 + >50 + ... ~= > 100 350

And those searches themselves are smaller and less complicated than the original searches.

Now I create a second (non-percolator) index listing all 1 000 000 saved searches and including the ids of the search components from the percolator index.

Then I percolate a document and then do a second query filtering the searches against the keyword and filter percolator results. I'm even able to preserve the relevance score as this is returned purely from the keyword searches.

This approach will significantly reduce my percolator index memory footprint while serving the same purpose.

I would like to invite feedback on this approach (I haven't tried it yet but I will keep you posted).

Likewise if my approach is successful do you think it is worth a feature request?

这篇关于如何优化弹性搜索渗滤器索引内存性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆