Liquibase或Flyaway数据库迁移替代Elasticsearch [英] Liquibase or Flyaway database migration alternative for Elasticsearch

查看:894
本文介绍了Liquibase或Flyaway数据库迁移替代Elasticsearch的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对ES很新。我一直在尝试搜索数据库迁移工具很久,我找不到一个。我想知道有没有人可以帮助我指出正确的方向。



我将使用Elasticsearch作为我的项目中的主要数据存储区。我想在我的项目中开发新的模块时,对所有映射和配置更改/数据导入/数据升级脚本进行版本化。



过去我使用数据库版本控制工具,如Flyaway或Liquibase。



有什么框架/脚本或方法可以使用ES来实现类似的东西?



有没有人有经验的手工使用脚本和运行迁移脚本至少升级脚本。



提前感谢

解决方案

从这个观点/需要,ES有很大的局限性:




  • 尽管有动态映射,ES是不是,而不是模式密集型。如果这种变化与现有文件冲突,实际上可以更改映射(实际上,如果任何文档中没有空字段,新的映射会影响,这将导致异常)

  • ES中的文档是不可变的:一旦索引了一个,你只能检索/删除。这里的句法糖是部分更新,这使得ES端的线程安全删除+索引(具有相同的id)



什么这意味着你的问题的背景?您基本上不能拥有ES的经典迁移工具。这样可以使您的工作更简单:




  • 使用严格映射(dynamic 严格和/或 index.mapper.dynamic:false ,看看映射文档)。这将保护您的索引/类型从




    • 意外动态映射错误类型

    • 明确错误的情况下,当你错过数据映射关系的一些错误


  • 你可以获取实际的ES映射并将其与您的数据模型。如果您的PL具有足够高的ES级别库,这应该很简单


  • 您可以利用迁移的索引别名







所以,一点点经验。对我来说,目前合理的流程是这样的:




  • 所有数据结构都被描述为代码中的模型。

  • 索引/映射创建调用是简单模型的方法。

  • 每个索引都有别名(即<$ c $指向实际索引(即 news_index_ {revision} _ {date_created} )的新闻)



每次正在部署的代码,您


  1. 尝试放置模型(类型)映射。如果没有错误,这意味着你有




    • 放置相同的映射

    • put映射是旧的纯粹的超集(只提供新的字段,旧的保持不变)

    • 没有文档在受新映射影响的字段中有值



    所有这一切实际上意味着您可以随身携带地图/数据,只需一如既往地使用数据。


  2. 如果ES提供有关新映射的异常,则


    • 使用新映射创建新的索引/类型(命名为 name_ {revision} _ {date}

    • 将您的别名重定向到新索引

    • 启动迁移代码, a href =https://www.elastic.co/guide/en/elasticsearch/reference/1.6/docs-bulk.html =nofollow noreferrer> 批量 请求快速重建索引
      在此重新索引期间,您可以通过别名安全地索引新文档,缺点是历史数据为p在重建索引期间可以正常使用。


这是经过生产测试的解决方案。关于这种方法的注意事项:




  • 如果您的读取请求需要一致的历史数据,则无法执行此操作

  • 您需要重新索引整个索引。如果你有一个类型每个索引(可行的解决方案),那么它的罚款。但有时您需要多类型索引

  • 数据网络往返。可能会疼痛






总结一下:




  • 尝试在你的模型中有很好的抽象,这总是有助于

  • 尝试保留历史数据/字段陈旧。只要建立你的代码与这个想法,首先比听起来更容易

  • 我强烈建议避免依赖使用ES实验工具的迁移工具。那些可以随时更改,例如 river - * 工具。


I am pretty new to ES. I have been trying to search for a db migration tool for long and I could not find one. I am wondering if anyone could help to point me to the right direction.

I would be using Elasticsearch as a primary datastore in my project. I would like to version all mapping and configuration changes / data import / data upgrades scripts which I run as I develop new modules in my project.

In the past I used database versioning tools like Flyaway or Liquibase.

Are there any frameworks / scripts or methods I could use with ES to achieve something similar ?

Does anyone have any experience doing this by hand using scripts and run migration scripts at least upgrade scripts.

Thanks in advance!

解决方案

From this point of view/need, ES have a huge limitations:

  • despite having dynamic mapping, ES is not schemaless but schema-intensive. Mappings cant be changed in case when this change conflicting with existing documents (practically, if any of documents have not-null field which new mapping affects, this will result in exception)
  • documents in ES is immutable: once you've indexed one, you can retrieve/delete in only. The syntactic sugar around this is partial update, which makes thread-safe delete + index (with same id) on ES side

What does that mean in context of your question? You, basically, can't have classic migration tools for ES. And here's what can make your work with ES easier:

  • use strict mapping ("dynamic": "strict" and/or index.mapper.dynamic: false, take a look at mapping docs). This will protect your indexes/types from

    • being accidentally dynamically mapped with wrong type
    • get explicit error in case when you miss some error in data-mapping relation
  • you can fetch actual ES mapping and compare it with your data models. If your PL have high enough level library for ES, this should be pretty easy

  • you can leverage index aliases for migrations


So, a little bit of experience. For me, currently reasonable flow is this:

  • All data structures described as models in code. This models actually provide ORM abstraction too.
  • Index/mapping creation call is simple model's method.
  • Every index has alias (i.e. news) which points to actual index (i.e. news_index_{revision}_{date_created}).

Every time code being deployed, you

  1. Try to put model(type) mapping. If it's done w/o error, this means that you've either

    • put the same mapping
    • put mapping that is pure superset of old one (only new fields was provided, old stays untouched)
    • no documents have values in fields affected by new mapping

    All of this actually means that you're good to go with mappping/data you have, just work with data as always

  2. If ES provide exception about new mapping, you
    • create new index/type with new mapping (named like name_{revision}_{date}
    • redirect your alias to new index
    • fire up migration code that makes bulk requests for fast reindexing During this reindexing you can safely index new documents normally through the alias. The drawback is that historical data is partially available during reindexing.

This is production-tested solution. Caveats around such approach:

  • you cannot do such, if your read requests require consistent historical data
  • you're required to reindex whole index. If you have 1 type per index (viable solution) then its fine. But sometimes you need multi-type indexes
  • data network roundtrip. Can be pain sometimes

To sum up this:

  • try to have good abstraction in your models, this always helps
  • try keeping historical data/fields stale. Just build your code with this idea in mind, that's easier than sounds at first
  • I strongly recommend to avoid relying on migration tools that leverage ES experimental tools. Those can be changed anytime, like river-* tools did.

这篇关于Liquibase或Flyaway数据库迁移替代Elasticsearch的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆