有没有更聪明的方法来重新索引elasticsearch? [英] Is there a smarter way to reindex elasticsearch?

查看:27
本文介绍了有没有更聪明的方法来重新索引elasticsearch?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我之所以这么问是因为我们的搜索在我们解决问题时处于不断变化的状态,但是每次我们对索引进行更改(更改标记器或过滤器,或分片/副本的数量)时,我们都必须消除整个索引并将我们所有的 Rails 模型重新索引回 Elasticsearch ......这意味着我们必须考虑停机时间来重新索引我们的所有记录.

I ask because our search is in a state of flux as we work things out, but each time we make a change to the index (change tokenizer or filter, or number of shards/replicas), we have to blow away the entire index and re-index all our Rails models back into Elasticsearch ... this means we have to factor in downtime to re-index all our records.

有没有我不知道的更聪明的方法来做到这一点?

Is there a smarter way to do this that I'm not aware of?

推荐答案

我认为 @karmi 说得对.不过,让我解释得更简单一些.我需要偶尔使用一些新属性或分析设置升级生产模式.我最近开始使用下面描述的场景来进行实时、恒定负载、零停机时间的索引迁移.您可以远程执行此操作.

I think @karmi makes it right. However let me explain it a bit simpler. I needed to occasionally upgrade production schema with some new properties or analysis settings. I recently started to use the scenario described below to do live, constant load, zero-downtime index migrations. You can do that remotely.

以下是步骤:

  • 你有索引 real1 和别名 real_write, real_read 指向它,
  • 客户端只写入 real_write 并且只从 real_read 读取,
  • _source 文档的属性可用.
  • You have index real1 and aliases real_write, real_read pointing to it,
  • the client writes only to real_write and reads only from real_read ,
  • _source property of document is available.

使用您选择的新映射和设置创建 real2 索引.

Create real2 index with new mapping and settings of your choice.

使用以下批量查询切换写入别名.

Using following bulk query switch write alias.

curl -XPOST 'http://esserver:9200/_aliases' -d '
{
    "actions" : [
        { "remove" : { "index" : "real1", "alias" : "real_write" } },
        { "add" : { "index" : "real2", "alias" : "real_write" } }
    ]
}'

这是原子操作.从此时起 real2 将在所有节点上填充新客户端的数据.读者仍然通过 real_read 使用旧的 real1.这是最终的一致性.

This is atomic operation. From this time real2 is populated with new client's data on all nodes. Readers still use old real1 via real_read. This is eventual consistency.

数据必须从 real1 迁移到 real2,但是 real2 中的新文档不能被旧条目覆盖.迁移脚本应该使用 bulk API 和 create 操作(不是 indexupdate).我使用简单的 Ruby 脚本 es-reindex 有很好的 E.T.A.状态:

Data must be migrated from real1 to real2, however new documents in real2 can't be overwritten with old entries. Migrating script should use bulk API with create operation (not index or update). I use simple Ruby script es-reindex which has nice E.T.A. status:

$ ruby es-reindex.rb http://esserver:9200/real1 http://esserver:9200/real2

UPDATE 2017 您可以考虑使用新的 Reindex API 而不是使用脚本.它有许多有趣的功能,例如冲突报告等.

UPDATE 2017 You may consider new Reindex API instead of using the script. It has lot of interesting features like conflicts reporting etc.

现在 real2 是最新的并且客户端正在写入它,但是他们仍在从 real1 读取.让我们更新读者别名:

Now real2 is up to date and clients are writing to it, however they are still reading from real1. Let's update reader alias:

curl -XPOST 'http://esserver:9200/_aliases' -d '
{
    "actions" : [
        { "remove" : { "index" : "real1", "alias" : "real_read" } },
        { "add" : { "index" : "real2", "alias" : "real_read" } }
    ]
}'

5.备份和删除旧索引

写入和读取到 real2.可以从ES集群备份和删除real1索引.

5. Backup and delete old index

Writes and reads go to real2. You can backup and delete real1 index from ES cluster.

完成!

这篇关于有没有更聪明的方法来重新索引elasticsearch?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆