大量删除的文档计数是否会影响 ES 查询性能 [英] Does huge number of deleted doc count affects ES query performance

查看:154
本文介绍了大量删除的文档计数是否会影响 ES 查询性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的 ES 集群中几乎没有读取过重索引(开始看到这些索引的性能问题),该集群拥有约 5000 万个文档,并注意到其中大多数文档删除了大约 25% 的文档,我知道这些当后台合并操作发生时,删除的文档数量会随着时间的推移而减少,但在我的情况下,这些数量总是约占文档总数的 25% 左右,我有以下问题/疑虑:

I have few read heavy indices(started seeing performance issues on these indices) in my ES cluster which has ~50 million docs and noticed most of them have around 25% of total documents as deleted, I know that these deleted document count decrease over time when background merge operation happens, But in my case these count is always around ~25% of total documents and I have below questions/concerns:

  1. 这些巨大的删除计数是否会影响搜索性能,因为它们仍然是 lucene 不可变段的一部分,并且搜索发生在所有段并返回最新版本的文档,因此不可变段的大小会很高,因为它们包含巨大的删除的文档数量,然后再执行另一个操作来找出最新版本的文档.
  2. 如果有大量已删除的文档,定期合并操作是否会花费大量时间且效率低下?
  3. 有没有什么办法可以一次性删除这些大量已删除的文档,因为看起来后台合并操作跟不上大量文档?

谢谢

推荐答案

您删除的文档仍然是索引的一部分,因此它们会影响搜索性能(但我不能告诉您是否有巨大影响).

your deleted documents are still part of the index so they impact the search performance ( but I can't tell you if its a huge impact ).

对于周期性合并,Lucene不愿意"合并重段,因为它需要一些磁盘空间并产生大量IO.

For the periodic merge, Lucene is "reluctant" to merge heavy segments as it requires some disk space and generates a lot of IO.

由于 索引段 API

如果您有接近 5GB 限制的段,它们可能不会自动合并,直到它们大部分由已删除的文档构成.

If you have segments close to the 5GB limit, it is probable that they won't be merged automatically until they are mostly constituted with deleted docs.

您可以使用 强制合并API

请记住,强制合并可能会对大量索引的集群产生一些压力.存在仅删除文档的选项,这应该可以减轻负担.

Remember a force merge can generate some stress on a cluster for huge indices. An option exists to only delete documents, that should reduce the burden.

only_expunge_deletes (Optional, boolean) 如果为真,只删除包含文档删除的段.默认为 false.

only_expunge_deletes (Optional, boolean) If true, only expunge segments containing document deletions. Defaults to false.

在 Lucene 中,文档不会从段中删除;只是标记为删除.在合并期间,会创建一个新段,该段不包含那些文档删除.

In Lucene, a document is not deleted from a segment; just marked as deleted. During a merge, a new segment is created that does not contain those document deletions.

问候

这篇关于大量删除的文档计数是否会影响 ES 查询性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆