ElasticSearch-如何将索引合并为一个索引? [英] ElasticSearch - How to merge indexes into one index?

查看:1158
本文介绍了ElasticSearch-如何将索引合并为一个索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的集群自几个月前以来每天都有一个索引,
5个分片每个索引(默认值),
,并且我无法在整个集群上运行查询,因为有太多分片(超过1000个)。

My cluster has an index for each day since a few months ago, 5 shards each index (the default), and I can't run queries on the whole cluster because there are too many shards (over 1000).

文档ID是自动生成的。

The document IDs are automatically generated.

如何将索引合并到一个索引,处理有冲突的ID(甚至可能发生冲突)并更改类型?

How can I combine the indexes into one index, deal with conflicting ids (if conflicts are even possible), and change the types?

我正在使用ES版本5.2.1

I am using ES version 5.2.1

推荐答案

常见问题仅在使用ELK堆栈和 filebeat 创建索引后几个月才可见天。这里有一些解决性能问题的选项。

Common problem that is visible only after few months of using ELK stack with filebeat creating indices day by day. There is a few options to fix the performance issue here.

首先您可以使用 _forcemerge 来限制Lucene索引中的段数。操作不会限制或合并索引,但会提高Elasticsearch的性能。

First you can use _forcemerge to limit the numer of segments inside Lucene index. Operation won't limit or merge indices but will improve the performance of Elasticsearch.

curl -XPOST 'localhost:9200/logstash-2017.07*/_forcemerge?max_num_segments=1'

整月指数和强制合并细分。每月完成一次后,它应该可以大大改善Elasticsearch的性能。在我的情况下,CPU使用率从100%下降到2.7%。

This will run through the whole month indices and force merge segments. When done for every month, it should improve the Elasticsearch performance a lot. In my case CPU usage went down from 100% to 2.7%.

不幸的是,这无法解决分片问题。

Unfortunately this won't solve the shards problem.


请阅读 _reindex 文档,并在继续之前备份数据库。

Please read the _reindex documentation and backup your database before continue.

tomas 。如果要限制分片或索引的数量,除了使用 _reindex 将几个索引合并为一个。这可能需要一段时间,具体取决于您拥有的索引的数量和大小。

As tomas mentioned. If you want to limit number of shards or indices there is no other option than use _reindex to merge few indices into one. This can take a while depending on the number and size of indices you have.

您可以事先创建目标索引并指定其应包含的分片数量。这样可以确保您的最终索引拥有所需的分片数量。

You can create the destination index beforehand and specify number of shards it should contain. This will ensure yours finial index will have the number of shards you need.

curl -XPUT 'localhost:9200/new-logstash-2017.07.01?pretty' -H 'Content-Type: application/json' -d'
{
    "settings" : {
        "index" : {
            "number_of_shards" : 1 
        }
    }
}
'



限制的分片数量



如果要限制每个索引的分片数量,可以运行 _reindex 一对一。在这种情况下,不应删除任何条目,因为它将是精确的副本,但碎片数量较少。

Limiting number of shards

If you want to limit number of shards per index you can run _reindex one to one. In this case there should be no entries dropped as it will be exact copy but with smaller number of shards.

curl -XPOST 'localhost:9200/_reindex?pretty' -H 'Content-Type: application/json' -d'
{
    "conflicts": "proceed",
    "source": {
        "index": "logstash-2017.07.01"
    },
    "dest": {
        "index": "logstash-v2-2017.07.01",
        "op_type": "create"
    }
}
'

执行此操作后,您可以删除旧索引并使用新索引。不幸的是,如果您想使用旧名称,则需要 _reindex 再次使用新名称。如果您决定这样做

After this operation you can remove old index and use new one. Unfortunately if you want to use old name you need to _reindex one more time with new name. If you decide to do that


不要忘记指定新索引的数量!默认情况下,它将降为5。

DON'T FORGET TO SPECIFY NUMBER OF SHARDS FOR THE NEW INDEX! By default it will fall back to 5.



合并多个索引并限制分片数量



Merging multiple indices and limiting number of shards

curl -XPOST 'localhost:9200/_reindex?pretty' -H 'Content-Type: application/json' -d'
{
    "conflicts": "proceed",
    "source": {
        "index": "logstash-2017.07*"
    },
    "dest": {
        "index": "logstash-2017.07",
        "op_type": "create"
    }
}
'

完成后,您应该具有从 logstash-2017.07.01 logstash-2017.07.31 合并为 logstash-2017.07 。请注意,旧索引必须手动删除。

When done you should have all entries from logstash-2017.07.01 to logstash-2017.07.31 merged into logstash-2017.07. Note that the old indices must be deleted manually.

某些条目可以被覆盖或合并,具体取决于哪个冲突 op_type 选项。

Some of the entries can be overwritten or merged, depending which conflicts and op_type option you choose.

您可以设置索引模板,它将在每次创建新的 logstash 索引时使用。

You can set up index template that will be used every time new logstash index is created.

curl -XPUT 'localhost:9200/_template/template_logstash?pretty' -H 'Content-Type: application/json' -d'
{
    "template" : "logstash-*",
    "settings" : {
        "number_of_shards" : 1
    }
}
'

这将确保创建的每个新索引都匹配 logstash-名称中只有一个碎片。

This will ensure every new index created that match logstash- in name to have only one shard.

如果您没有流太多日志,则可以设置 logstash 以按月对日志进行分组。

If you don't stream too many logs you can set up your logstash to group logs by month.

// file: /etc/logstash/conf.d/30-output.conf

output {
    elasticsearch {
        hosts => ["localhost"]
        manage_template => false
        index => "%{[@metadata][beat]}-%{+YYYY.MM}"
        document_type => "%{[@metadata][type]}"
    }
}



最后的想法



修复最初的错误配置并不容易!优化您的弹性搜索,祝您好运!

Final thoughts

It's not easy to fix initial misconfiguration! Good luck with optimising your Elastic search!

这篇关于ElasticSearch-如何将索引合并为一个索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆