有没有更聪明的方式来索引弹性搜索? [英] Is there a smarter way to reindex elasticsearch?

查看:109
本文介绍了有没有更聪明的方式来索引弹性搜索?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我问我是因为我们的工作正在处于一种流淌状态,但是每当我们修改索引(更改分类器或过滤器,或碎片/副本的数量)时,我们都必须把整个索引并将所有的Rails模型重新指定为弹性搜索,这意味着我们必须考虑停机时间来重新索引所有的记录。



是否有

解决方案

我觉得@karmi是正确的。但是让我解释一下一点。我需要偶尔使用一些新的属性或分析设置升级生产模式。
我最近开始使用下面描述的场景进行实时,恒定负载,零停机索引迁移。您可以远程执行。



以下是步骤:



假设:




  • 您的索引 real1 和别名 real_write real_read 指向它,

  • 客户端只写入 real_write ,只读取 real_read

  • _source 文件属性可用。



1。新索引



使用您选择的新映射和设置创建 real2 索引。



2。 Writer别名开关



使用以下批量查询切换写入别名。

  curl -XPOST'http:// esserver:9200 / _aliases'-d'
{
actions:[
{remove:{index:real1,别名:real_write}},
{add:{index:real2,alias:real_write}}
]
}'

这是原子操作。从这时起,$ code> real2 在所有节点上填充新的客户端数据。读者仍然通过 real_read 使用旧的 real1 。这是最终的一致性。



3。旧数据迁移



数据必须从 real1 迁移到 real2 ,但是, real2 中的新文档不能被旧条目覆盖。迁移脚本应该使用批量 API与创建操作(而不是索引更新)。我使用简单的Ruby脚本 es-reindex ,它具有很好的E.T.A.状态:

  $ ruby​​ es-reindex.rb http:// esserver:9200 / real1 http:// esserver:9200 / real2 



4。阅读器别名开关



现在 real2 是最新的,客户端正在写信给他们,但是他们仍在阅读 real1 。让我们更新读者别名:

  curl -XPOST'http:// esserver:9200 / _aliases'-d'
{
actions:[
{remove:{index:real1,alias:real_read}},
{add:{index :real2,alias:real_read}}
]
}'



5。备份和删除旧索引



写入和读取转到 real2 。您可以从ES群集备份和删除 real1 索引。



完成!


I ask because our search is in a state of flux as we work things out, but each time we make a change to the index (change tokenizer or filter, or number of shards/replicas), we have to blow away the entire index and re-index all our Rails models back into Elasticsearch ... this means we have to factor in downtime to re-index all our records.

Is there a smarter way to do this that I'm not aware of?

解决方案

I think @karmi makes it right. However let me explain it a bit simpler. I needed to occasionally upgrade production schema with some new properties or analysis settings. I recently started to use the scenario described below to do live, constant load, zero-downtime index migrations. You can do that remotely.

Here are steps:

Assumptions:

  • You have index real1 and aliases real_write, real_read pointing to it,
  • the client writes only to real_write and reads only from real_read ,
  • _source property of document is available.

1. New index

Create real2 index with new mapping and settings of your choice.

2. Writer alias switch

Using following bulk query switch write alias.

curl -XPOST 'http://esserver:9200/_aliases' -d '
{
    "actions" : [
        { "remove" : { "index" : "real1", "alias" : "real_write" } },
        { "add" : { "index" : "real2", "alias" : "real_write" } }
    ]
}'

This is atomic operation. From this time real2 is populated with new client's data on all nodes. Readers still use old real1 via real_read. This is eventual consistency.

3. Old data migration

Data must be migrated from real1 to real2, however new documents in real2 can't be overwritten with old entries. Migrating script should use bulk API with create operation (not index or update). I use simple Ruby script es-reindex which has nice E.T.A. status:

$ ruby es-reindex.rb http://esserver:9200/real1 http://esserver:9200/real2

4. Reader alias switch

Now real2 is up to date and clients are writing to it, however they are still reading from real1. Let's update reader alias:

curl -XPOST 'http://esserver:9200/_aliases' -d '
{
    "actions" : [
        { "remove" : { "index" : "real1", "alias" : "real_read" } },
        { "add" : { "index" : "real2", "alias" : "real_read" } }
    ]
}'

5. Backup and delete old index

Writes and reads go to real2. You can backup and delete real1 index from ES cluster.

Done!

这篇关于有没有更聪明的方式来索引弹性搜索?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆