优化 API 以减少分段并消除 ES 删除的文档不起作用 [英] Optimize API for reducing the segments and eliminating ES deleted docs not working

查看:17
本文介绍了优化 API 以减少分段并消除 ES 删除的文档不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我之前问题的延续 大量删除的文档计数是否会影响 ES 查询性能与我的 ES 索引中已删除的文档相关.

正如答案中所指出的,我使用 优化 APIES 1.X 版本,其中 强制合并 API 不可用,但在阅读了弹性创始人 Say Bannon 的优化 API github 链接(之前在 ES 网站上找不到它)后,看起来它做了同样的工作.

在运行优化 API 后,我收到了索引的成功消息,但我没有看到已删除文档的总数减少,当我使用 segments API,我看到每个分片有超过 25 个段和每个分片在内存中保存 250-1 GB 数据和近 500k 文档,而我看到有些分片中删除的文档很少.

所以我的问题是:

  1. 我的索引在多个数据节点上有多个分片,当我仅使用 1 个节点 URL 运行优化 API 时,它是否只合并该节点上的段?
  2. 在段 API 结果中,它显示像 "node": "f2hsqeamadnaskda" 之类的节点 ID,而我正在使用 KOPF 插件并且我的数据节点有自定义名称,所以我该如何关联这个我的人类可读节点名称的神秘节点名称,以识别语句 1 是否正确?
  3. 由于没有关于优化 API 的文档,是否可以一次合并跨所有节点的所有分片上的段?我需要在应用之前将索引设为只读吗?

解决方案

@Nirmal 已经回答了你的前两个问题,所以:

<块引用>

  1. 由于没有关于优化 API 的文档,是否可以一次合并跨所有节点的所有分片上的段?我需要在应用之前将索引设为只读吗?

有适用于 1.x 的文档:https://www.elastic.co/guide/en/elasticsearch/reference/1.7/indices-optimize.html.您可能正在寻找这样的电话:

  • GET <index_pattern>/_cat/segments:列出所有分片中的所有段(可以是数千个).还会列出已删除的文档.
  • POST <index_pattern>/_optimize?max_num_segments=1:强制将所有段合并为每个分片 1 个单段.当不再写入索引时执行此操作.它有助于减少数据节点上 CPU/RAM 的负载.
  • POST <index_pattern>/_optimize?only_expunge_deletes=true:仅删除已删除的文档

最后,你可以使用 * 作为 <index_pattern> 来做整个集群的所有索引.

This is in continuation of my previous question Does huge number of deleted doc count affects ES query performance related to deleted docs in my ES index.

As pointed in the answer, I used optimize API as I am using the ES 1.X version where force merge API is not available but after reading about optimize API github link(provided earlier as couldn't find it on ES site) by Say Bannon founder of elastic, looks like it does the same work.

I got the success message for my index after running the optimize API, but I don't see total count of deleted docs decreasing and I am worried as when I checked the segments of my index using segments API, I see there are more than 25 segments for each shard and every shard is holding 250-1 gb of data in memory and almost 500k docs, while I see there are some shards where there is few deleted docs.

So my question are:

  1. My index is having multiple shards across multiple data nodes and when I ran optimize API using only 1 node URL, then does it only merges the segments on that node?
  2. In segment API result it shows the node-id like "node": "f2hsqeamadnaskda", while I am using KOPF plugin and have custom names for my data nodes, so How can I relate this cryptic node name to my human readable node name to identify whether statement 1 is correct or not?
  3. As there is no documentation available on optimize API, is it possible to merge segments on all shards across all nodes in single shot? and do I need to make index read-only before applying it?

解决方案

@Nirmal has answered your first two questions, so:

  1. As there is no documentation available on optimize API, is it possible to merge segments on all shards across all nodes in single shot? and do I need to make index read-only before applying it?

There is documentation available for 1.x: https://www.elastic.co/guide/en/elasticsearch/reference/1.7/indices-optimize.html. You are probably looking for calls like these:

  • GET <index_pattern>/_cat/segments: List all segments in all the shards (can be thousands). Also lists deleted docs.
  • POST <index_pattern>/_optimize?max_num_segments=1: Force merge all segments to 1 single segment per shard. Do this when the index is no longer being written to. It helps to reduce load on CPU/RAM on the data nodes.
  • POST <index_pattern>/_optimize?only_expunge_deletes=true: only remove deleted docs

Finally, you can use * as <index_pattern> to just do all indices on the whole cluster.

这篇关于优化 API 以减少分段并消除 ES 删除的文档不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆