如何将mysql数据库同步到外部数据源 [英] How to sync a mysql database to external data source
问题描述
我有一个名为搜索
的mysql数据库表,我需要使用ElasticSearch索引来跟踪数据。我已经将表从表中导出到es索引,但是现在我需要保持数据保持同步,否则搜索将变得非常快速。
I have a mysql database table called search
that I need to keep up to data with an ElasticSearch index. I have already exported the table from the table to the es index, but now I need to keep the data in sync or else the search will become stale quite quickly.
唯一的方法我可以想到是通过导出表每x分钟,然后将其与最后导入的内容进行比较。这是不可行的,因为表有大约10M行,我不想每隔五分钟做表格导出。这将是一个很好的解决方案?请注意,我只有对数据库的读取权限。
The only way I can think of is by exporting the table every x minutes and then comparing it with what was last imported. This isn't feasible since the table has about 10M rows and I don't want to be doing table exports every five minutes all day long. What would be a good solution for this? Note that I only have read-access to the database.
推荐答案
我将利用Logstash与 jdbc
输入插件和 elasticsearch
输出插件。 博客文章显示了此解决方案的完整示例。
I would leverage Logstash with a jdbc
input plugin and an elasticsearch
output plugin. There's a blog article showing a full example of this solution.
之后安装Logstash ,您可以使用我上面提到的插件创建一个配置文件:
After installing Logstash, you can create a configuration file with the plugins I mentioned above like this:
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"
jdbc_user => "user"
jdbc_password => "1234"
jdbc_validate_connection => true
jdbc_driver_library => "mysql-connector-java-5.1.36-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
schedule => "5m"
statement => "SELECT * FROM search WHERE timestamp > :sql_last_value"
}
}
output {
elasticsearch {
protocol => http
index => "searches"
document_type => "search"
document_id => "%{uid}"
host => "ES_NODE_HOST"
}
}
您需要确保更改几乎没有什么价值来匹配你的环境,但是这应该是没有问题的,你需要做什么。
You need to make sure to change a few values to match your environment, but this should work out without a problem for what you need to do.
查询将每5分钟运行一次,并将获取所有搜索
记录,其 timestamp
(更改该名称以匹配您的数据)比上次查询运行时更新。选定的记录将在 ES_NODE_HOST
中的您的Elasticsearch服务器中的搜索
索引中。确保相应地更改索引和类型名称,以及主键字段的名称(即 uid
)以匹配您的数据。
Every 5 minutes the query will run and will fetch all search
records whose timestamp
(change that name to match your data) is more recent than the last time the query ran. The selected records will be sinked in the searches
index located in your Elasticsearch server on ES_NODE_HOST
. Make sure to change the index and type name accordingly, as well as the name of the primary key field (i.e. uid
) to match your data as well.
这篇关于如何将mysql数据库同步到外部数据源的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!