elasticsearch数据增加&每次重新启动时重复 [英] elasticsearch data increase & duplicate at each restart

查看:46
本文介绍了elasticsearch数据增加&每次重新启动时重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Windows 7 上使用带有 angularjs 和 oracle 的 elasticsearch.它的工作更多&更精细(感谢 stackoverflower 帮助).我在使用 elasticsearch 时遇到问题:我文档中的元素数量在增加,但我不知道为什么/如何.我的由 elasticsearch 索引的 oracle 表包含 12010 个元素,现在我在弹性文档中有 84070 个元素(经常由 curl _count 检查):所以它现在复制了 7 次数据.几天前我重新索引了表格,但我之前删除了 elasticsearchdata"文件夹.

I'm using elasticsearch with angularjs and oracle on windows 7. it's working more & more finer ( thanks to stackoverflower help ). I have a problem with elasticsearch: the number of elements in my document is increasing and i don't know why/how. My oracle table indexed by elasticsearch contain 12010 elements, now i got 84070 elements in elastic document (frequently checked by curl _count): so it duplicate the data 7 times now. I re-indexed the table few days ago but i remove elasticsearch "data" folder before.

每次重新启动 Windows 时数据似乎都会增加.

data seems to increase each time i restart windows.

感谢您的帮助.

这是我安装和索引数据的方式:

This is how i install and index my data :

我只是第一次这样做:

创建索引

curl -XPOST 'localhost:9200/donnees'

映射:

curl -XPUT 'localhost:9200/donnees/specimens/_mapping' -d '{
"specimens" : {
    "_all" : {"enabled" : true},
    "_index" : {"enabled" : true},
    "_id" : {"index": "not_analyzed", "store" : false},
    "properties" : {
        "O_OCCURRENCEID"                                : {"type" : "string",   "store" : "no","index": "not_analyzed"  } ,
            .... 
        "I_INSTITUTIONCODE"                             : {"type" : "string",   "store" : "yes","index": "analyzed" } 
    }
}}'

查询oracle和索引数据:

query oracle and index data :

curl -XPUT 'localhost:9200/_river/donnees_s/_meta' -d '{
 "type" : "jdbc",
 "jdbc" : {
    "index" : "donnees",
    "type" : "specimens",
    "url" : "jdbc:oracle:thin:@localhost:1523:recolnat",
     "user" : "user",
     "password" : "password",
     "sql" : "select * from all_specimens_data"
   }
}'

(这是正确的吗?我用来查询)

( is this correct ?? it doesn't work if i replace "curl -XPUT 'localhost:9200/_river/donnees_s/_meta'" by "curl -XPUT 'localhost:9200/donnees/specimens/_meta' which i use to query )

测试:

curl -XGET 'http://localhost:9200/donnees/specimens/_count?q=*'
    => 12010
curl -XGET 'http://localhost:9200/donnees/specimens/_search?q=P00009359'
    => return data ok

推荐答案

感谢 Konstantin V. Salikhov 解决.

Resolved thanks to Konstantin V. Salikhov.

每次启动elasticsearch服务时,它都会使用提供给_river的sql查询数据库并获取数据(参见我之前的查询oracle和索引数据:").如果数据没有_id"列,_river 无法确定它已经加载了哪些记录,并且每次都会复制数据.为了避免重复,我编辑了数据库中的all_specimens_data"表(实际上是一个视图以避免修改数据库)并将O_OCCURRENCEID"重命名为_id",O_OCCURRENCEID"是我的主键 UUID.

Each time elasticsearch service start it query the database with the sql provided to the _river and get the data ( see me previous "query oracle and index data : "). If the data don't have an "_id" column _river can't determine which records it have already loaded and the data is duplicated each time. To avoid duplicate i edit my "all_specimens_data" table in database ( who is in fact a view to avoid modification o database) and rename "O_OCCURRENCEID" to "_id", "O_OCCURRENCEID" is my primary key UUID.

希望这对其他人有帮助

这篇关于elasticsearch数据增加&每次重新启动时重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆