ElasticSearch river JDBC MySQL不删除记录 [英] ElasticSearch river JDBC MySQL not deleting records

查看:101
本文介绍了ElasticSearch river JDBC MySQL不删除记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用ElasticSearch的JDBC插件来更新我的MySQL数据库。它会拾取新的和更改的记录,但不会删除从MySQL中删除的记录。它们仍然在索引中。



这是我用来创建河流的代码:

 -d $ {
type:jdbc,
jdbc:{
curl -XPUT'localhost:9200 / _river / account_river / _meta'驱动程序:com.mysql.jdbc.Driver,
url:jdbc:mysql:// localhost:3306 / test,
user:test_user bpassword:test_pass,
sql:SELECT`account`.`id` as`_id`,`account`.`id`,`account`.`reference`,`account` .`company_name`,`account`.`also_known_as`````````````````````````````````````````````````````````````````````````````````````````````````````````` ,
versioning:true,
digesting:false,
autocommit:true,
index:耳机,
type :帐户
}
}'

通过自制软件在OSX上安装ElasticSearch山狮,没有错误或问题,一切如前所述pected。许可证OK,日志中没有错误。



我已经删除,并包含(并设置为true和false) autocommit 版本化 digesting 我可以想到。它是一个开发数据库,​​所以我确信记录被完全删除,没有缓存而不是软删除。如果我删除所有记录(即保持河流完整,只需删除ES上索引的内容),下一次河流更新时,不会重新添加记录,这导致我相信我错过了有关版本控制和删除的内容



注意我也尝试了各种方法来指定 _id 列,我检查它有一个



Cheers。

解决方案

自此,问题已经被问及,参数发生了很大的变化,版本化和消化已经被淘汰,投票已被时间表所取代,这将取决于经常重新运行河流的次数(下面每5分钟运行一次) / p>

  curl -XPUT'localhost:9200 / _river / account_river / _meta'-d'{
type: jdbc,
jdbc:{
driver:com.mysql.jdbc.Driver,
url:jdbc:mysql:// localhost:3306 / test ,
user:test_user,
password:test_pass,
sql:SELECT`account`.`id` as`_id`,`account `.`id`,`account`.`reference`,`account`.`company_name`,`account`.`also_known_as`从`account` WHERE NOT`account`.`deleted`,
策略:简单,
schedule:0 0/5 * * *? ,
autocommit:true,
index:耳机,
type:Account
}
}'

但是对于主要问题,我从开发人员那里得到的答案是这个
https://github.com/jprante/elasticsearch-river-jdbc/issues/213



不再检测到行的删除。


我尝试使用版本控制进行内部管理,但这并没有很好的
以及增量更新和添加行。



一个好的方法窗口索引。每个时间框架(也许每天或每周一次
)为河创建一个新的索引,并将
添加到别名。一段时间后,旧指数将下降。这个
维护与logstash索引类似,但它位于河流
范围之外。


方法我目前正在使用作为我研究的别名是我每天重新创建索引和河流,并安排河流每隔几个小时运行一次。它确保新的数据被放入当天索引,删除将每24小时反映


I'm using the JDBC plugin for ElasticSearch to update my MySQL database. It picks up new and changed records, but does not delete records that have been removed from MySQL. They remain in the index.

This is the code I use to create the river:

curl -XPUT 'localhost:9200/_river/account_river/_meta' -d '{
    "type" : "jdbc",
    "jdbc" : {
        "driver" : "com.mysql.jdbc.Driver",
        "url" : "jdbc:mysql://localhost:3306/test",
        "user" : "test_user",
        "password" : "test_pass",
        "sql" : "SELECT `account`.`id` as `_id`, `account`.`id`, `account`.`reference`, `account`.`company_name`, `account`.`also_known_as` from `account` WHERE NOT `account`.`deleted`",
        "strategy" : "simple",
        "poll" : "5s",
        "versioning" : true,
        "digesting" : false,
        "autocommit" : true,
        "index" : "headphones",
        "type" : "Account"
    }
}'

Installed ElasticSearch via homebrew on OSX Mountain Lion, no errors or problems and everything responds as expected. Permissions OK, no errors in logs.

I have removed, and included (and set to true and false) every combination of autocommit, versioning and digesting that I could think of. It's a dev database so I'm sure that records are deleted fully, not cached and not soft-deleted. If I delete all the records (i.e. leave the river intact and just delete what was indexed on ES), the next time the river updates it does not re-add the record, which leads me to believe I have missed something regarding versioning and deleting.

Note I've also tried various ways to specify the _id column, and I checked that it had a value via JSON on call.

Cheers.

解决方案

Since this question has been asked, the parameters have changed greatly, versioning and digesting have been deprecated, and poll has been replaced by schedule, which will take a cron expression on how often to rerun the river (below is scheduled to run every 5 mins)

    curl -XPUT 'localhost:9200/_river/account_river/_meta' -d '{
        "type" : "jdbc",
        "jdbc" : {
            "driver" : "com.mysql.jdbc.Driver",
            "url" : "jdbc:mysql://localhost:3306/test",
            "user" : "test_user",
            "password" : "test_pass",
            "sql" : "SELECT `account`.`id` as `_id`, `account`.`id`, `account`.`reference`, `account`.`company_name`, `account`.`also_known_as` from `account` WHERE NOT `account`.`deleted`",
            "strategy" : "simple",
            "schedule": "0 0/5 * * * ?" ,
            "autocommit" : true,
            "index" : "headphones",
            "type" : "Account"
        }
    }'

But for the main question, the answer i got from the developer is this https://github.com/jprante/elasticsearch-river-jdbc/issues/213

Deletion of rows is no longer detected.

I tried housekeeping with versioning, but this did not work well together with incremental updates and adding rows.

A good method would be windowed indexing. Each timeframe (maybe once per day or per week) a new index is created for the river, and added to an alias. Old indices are to be dropped after a while. This maintenance is similar to logstash indexing, but it is outside the scope of a river.

The method i am currently using as a I research aliasing is I recreate the index and river nightly, and schedule the river to run every few hours. It ensures new data being put in will be indexed that day, and deletions will reflect every 24 hrs

这篇关于ElasticSearch river JDBC MySQL不删除记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆