什么是将数据从mysql同步到弹性搜索的最佳方法 [英] What is the Best way to sync data from mysql to elastic search

查看:106
本文介绍了什么是将数据从mysql同步到弹性搜索的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有ES 2.2并为Elasticsearch elasticsearch-jdbc-2.2.0.0安装了JDBC导入程序,并且能够插入数据,但不能通过mysql的更改来更新ES,即.与ES同步mysql.我该如何进行同步?我执行了以下shell脚本一次,数据已正确插入,但调度程序的工作正常.它不是每分钟执行一次以捕获mysql(方案表)中的任何更改.我的脚本有问题吗?或任何可用的解决方法?

I have ES 2.2 and installed JDBC importer for Elasticsearch elasticsearch-jdbc-2.2.0.0 and have been able to insert data, but not being to update ES with a change in mysql, ie. syncing of mysql with ES. How do i do the sync? I executed the following shell script once, data got inserted properly but the scheduler dint work. It is not executing every minute to capture any changes in mysql(schemes table). Is there something wrong in my script? or any workaround available?

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
bin=${DIR}/bin
lib=${DIR}/lib
echo $lib
echo $bin

echo '{
"type" : "jdbc",
"autocommit" : true,
"schedule" : "0 0-59 0-23 ? * *",
"jdbc" : {
"driver": "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://XXX:3306/blahblah",
"user" : "abc",
"password" : "xyz",
"sql" : "select * from schemes",
"elasticsearch" : {
"cluster" : "mycluster",
"host" : "localhost",
"port" : 9300
},
"max_bulk_actions" : 20000,
"max_concurrent_bulk_requests" : 10,
"index" : "movies",
"type":"scheme"
}
}
' | java -cp "${lib}/*" -Dlog4j.configurationFile=${bin}/log4j2.xml org.xbib.tools.Runner org.xbib.tools.JDBCImporter

推荐答案

我建议使用Logstash

I would suggest to use Logstash jdbc-plugin to sync MySQL data to Elasticsearch.

反对者在评论中询问如何将已删除的记录从MySQL同步到Elasticsearch

From comment as opponent asking about how to sync deleted records from MySQL to Elasticsearch

也许是将已删除的记录从MySQL同步到Elasticsearch的其他有效方法:)但是我在这里分享了我的所作所为.

May be some other efficient ways to sync deleted records from MySQL to Elasticsearch :) But I am sharing here what I did.

第1步: 让我们以架构表为例.添加一列以维护该架构的状态.类似于status = 0(默认)和status = 1(用于删除).还有一列针对updated_date.如果删除了任何记录,则将status = 1更改,然后将update_date更改为当前日期.

Step 1: Lets take example of schema table. Add one column to maintain status of that schema. Something like status = 0 (default) and status = 1 ( for deleted). and also one column for updated_date. When any records delete then change the status=1 and and updated_date to current date.

第2步:

我们不需要每次都同步整个数据.索引一次完整的数据,然后更改mysql查询以从最近24小时或您的用例适合的任何时间间隔中获取记录.

We dont need to sync whole data every time. Index complete data one time then change the mysql query to fetch records from last 24 hours or whatever time interval fit in your use case.

步骤3:更改查询以仅获取最近24小时的数据

Step 3: Change query to fetch data from last 24 hours only

 select * from schemes where (updated_date >= FROM_UNIXTIME(UNIX_TIMESTAMP(?)-86400,"%Y-%m-%d"))

现在,您已删除的数据状态将在Elasticsearch索引中更改为status = 1.

Now your deleted data status will be changed to status=1 in your Elasticsearch index.

因此您可以查询活动记录,例如

So you can query your active records like

{
    "query": {
        "filtered": {
           "filter": {
               "bool": {
                   "must": [
                      {
                          "term": {
                             "status": 1
                          }
                      }
                   ]
               }
           }
        }
    }
}

这篇关于什么是将数据从mysql同步到弹性搜索的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆