elasticsearch数据增加&每次重新启动时重复 [英] elasticsearch data increase & duplicate at each restart
问题描述
我在 Windows 7 上使用带有 angularjs 和 oracle 的 elasticsearch.它的工作更多&更精细(感谢 stackoverflower 帮助).我在使用 elasticsearch 时遇到问题:我文档中的元素数量在增加,但我不知道为什么/如何.我的由 elasticsearch 索引的 oracle 表包含 12010 个元素,现在我在弹性文档中有 84070 个元素(经常由 curl _count 检查):所以它现在复制了 7 次数据.几天前我重新索引了表格,但我之前删除了 elasticsearchdata"文件夹.
I'm using elasticsearch with angularjs and oracle on windows 7. it's working more & more finer ( thanks to stackoverflower help ). I have a problem with elasticsearch: the number of elements in my document is increasing and i don't know why/how. My oracle table indexed by elasticsearch contain 12010 elements, now i got 84070 elements in elastic document (frequently checked by curl _count): so it duplicate the data 7 times now. I re-indexed the table few days ago but i remove elasticsearch "data" folder before.
每次重新启动 Windows 时数据似乎都会增加.
data seems to increase each time i restart windows.
感谢您的帮助.
这是我安装和索引数据的方式:
This is how i install and index my data :
我只是第一次这样做:
- 在文件夹中解压弹性:D:workelasticsearch-1.3.1
- 安装网页界面:>plugin -install mobz/elasticsearch-head
- 安装 jdbc : >plugin --install jdbc --url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-river-jdbc/1.3.0.0/elasticsearch-river-jdbc-1.3.0.0-plugin.zip
- 将ojdbc6-11.2.0.3.jar"复制到D:workelasticsearch-1.3.1pluginsjdbc"
- service.bat 安装
- service.bat 启动
创建索引
curl -XPOST 'localhost:9200/donnees'
映射:
curl -XPUT 'localhost:9200/donnees/specimens/_mapping' -d '{
"specimens" : {
"_all" : {"enabled" : true},
"_index" : {"enabled" : true},
"_id" : {"index": "not_analyzed", "store" : false},
"properties" : {
"O_OCCURRENCEID" : {"type" : "string", "store" : "no","index": "not_analyzed" } ,
....
"I_INSTITUTIONCODE" : {"type" : "string", "store" : "yes","index": "analyzed" }
}
}}'
查询oracle和索引数据:
query oracle and index data :
curl -XPUT 'localhost:9200/_river/donnees_s/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"index" : "donnees",
"type" : "specimens",
"url" : "jdbc:oracle:thin:@localhost:1523:recolnat",
"user" : "user",
"password" : "password",
"sql" : "select * from all_specimens_data"
}
}'
(这是正确的吗?我用来查询)
( is this correct ?? it doesn't work if i replace "curl -XPUT 'localhost:9200/_river/donnees_s/_meta'" by "curl -XPUT 'localhost:9200/donnees/specimens/_meta' which i use to query )
测试:
curl -XGET 'http://localhost:9200/donnees/specimens/_count?q=*'
=> 12010
curl -XGET 'http://localhost:9200/donnees/specimens/_search?q=P00009359'
=> return data ok
推荐答案
感谢 Konstantin V. Salikhov 解决.
Resolved thanks to Konstantin V. Salikhov.
每次启动elasticsearch服务时,它都会使用提供给_river的sql查询数据库并获取数据(参见我之前的查询oracle和索引数据:").如果数据没有_id"列,_river 无法确定它已经加载了哪些记录,并且每次都会复制数据.为了避免重复,我编辑了数据库中的all_specimens_data"表(实际上是一个视图以避免修改数据库)并将O_OCCURRENCEID"重命名为_id",O_OCCURRENCEID"是我的主键 UUID.
Each time elasticsearch service start it query the database with the sql provided to the _river and get the data ( see me previous "query oracle and index data : "). If the data don't have an "_id" column _river can't determine which records it have already loaded and the data is duplicated each time. To avoid duplicate i edit my "all_specimens_data" table in database ( who is in fact a view to avoid modification o database) and rename "O_OCCURRENCEID" to "_id", "O_OCCURRENCEID" is my primary key UUID.
希望这对其他人有帮助
这篇关于elasticsearch数据增加&每次重新启动时重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!