确保弹性搜索与数据库同步 [英] Ensuring ElasticSearch is in Sync with Database
问题描述
我正在考虑每天的脚本来执行以下操作,以便考虑到ES服务器上的更新有问题的任何情况(我还没有高可用性设置,即使如此,它也是在DB和ES之间数据复制的情况下,仍然可能是一个很好的做法。在把这个脚本放在一起之前,我以为我会检查一下正确的方法,以及我是否应该使用任何图书馆或者技术。
I'm considering a daily script to do the following, in order to account for any situations where there was a problem with updates on the ES server (I don't yet have a high-availability setup and even so, it's still probably a good practice in a situation where data is being duplicated between DB and ES). Before putting this script together, I thought I'd check if I'm going about this the right way, and whether there are any libraries or techniques I should use.
该脚本将简单地从数据库中检索所有ID,并从ElasticSearch中获取所有ID,其中 created_at< current_time
(当前时间的快照,因为它是脚本运行时的移动目标)。
The script will simply retrieve all IDs from the database and all IDs from ElasticSearch, where created_at < current_time
(a snapshot of the current time, since it's a moving target as the script runs). It will then add and remove to Elastic search based on the differences between these IDs sets.
这听起来像一个合理的方法吗?
Does this sound like a reasonable approach?
推荐答案
为了回答我的问题,这不是最好的方法。
To answer my question, this is not the best approach.
密集的方法是定期重建整个索引。当然,这在生产中很难做到,因为这会导致几分钟或几个小时的停机时间,所以诀窍是重建一个新的索引并切换到使用它。在ElasticSearch中,您无法重命名索引,但可以使用别名。
A simpler, if more resource-intensive, approach is to re-build the entire index periodically. Of course, this is difficult to do in production as it would cause minutes or hours of downtime, so the trick is to rebuild a new index and switch to using that. In ElasticSearch, you can't rename an index, but you can use aliases.
There's a discussion of the approach here and a rake task for Tire users here.
这篇关于确保弹性搜索与数据库同步的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!