弹性搜索数据增加每次重新启动时重复 [英] elasticsearch data increase & duplicate at each restart
问题描述
它的工作更多&更精细(谢谢stackoverflower帮助)。我有一个弹性搜索的问题:我的文档中的元素数量在增加,我不知道为什么/如何。
由弹性搜索索引的我的oracle表包含12010个元素,现在我在弹性文档中有84070个元素(经常由curl _count检查):所以它现在复制了7次数据。我几天前重新编入了表格,但我删除了elasticsearchdata文件夹。
数据似乎每次重新启动Windows时都会增加。
感谢您的帮助。
这是我如何安装和索引我的数据:
我第一次这样做:
- 解压缩文件夹中的弹性:D :\work\elasticsearch-1.3.1\
- 安装web界面:> plugin -install mobz / elasticsearch-head
- 安装jdbc :> plugin --install jdbc --url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-river-jdbc/1.3.0.0/elasticsearch-river-jdbc-1.3.0.0 -plugin.zip
- 将ojdbc6-11.2.0.3.jar复制到D:\work\elasticsearch-1.3.1\plugins\jdbc
- service.bat install
- service.bat start
创建索引
curl -XPOST'localhost:9200 / donnees'
映射:
curl -XPUT'localhost:9200 / donnees / specimens / _mapping'-d {
{
_all:{enabled:true},
_index:{enabled:true},
_id :{index:not_analyzed,store:false},
properties:{
O_OCCURRENCEID:{type:string,store ,index:not_analyzed},
....
I_INSTITUTIONCODE:{type:string,store:yes,index }
}
}}'
查询oracle和索引数据: p>
curl -XPUT'localhost:9200 / _river / donnees_s / _meta'-d'{
pre>
type:jdbc ,
jdbc:{
index:donnee s,
type:samples,
url:jdbc:oracle:thin:@localhost:1523:recolnat,
user
password:password,
sql:select * from all_specimens_data
}
}'
(这是正确的吗?如果我用curl -XPUT'localhost:9200 / donnees / samples / _meta替换curl -XPUTlocalhost:9200 / _river / donnees_s / _meta,则不起作用。
测试:
curl -XGET'http:// localhost:9200 / donnees / samples / _count?q = *'
=> 12010
curl -XGET'http:// localhost:9200 / donnees / specimens / _search?q = P00009359'
=>返回数据ok
解决方案感谢Konstantin V. Salikhov。
每次弹性搜索服务启动时,将使用提供给_river的sql查询数据库,并获取数据(请参阅我之前的查询oracle和索引数据:)如果数据没有_id列_river不能确定它已经加载了哪些记录,每次都重复数据
为了避免重复,我编辑数据库中的all_specimens_data表(谁在事实上,避免修改o数据库)和rena我O_OCCURRENCEID到_id,O_OCCURRENCEID是我的主键UUID。
希望这个帮助其他
I'm using elasticsearch with angularjs and oracle on windows 7. it's working more & more finer ( thanks to stackoverflower help ). I have a problem with elasticsearch: the number of elements in my document is increasing and i don't know why/how. My oracle table indexed by elasticsearch contain 12010 elements, now i got 84070 elements in elastic document (frequently checked by curl _count): so it duplicate the data 7 times now. I re-indexed the table few days ago but i remove elasticsearch "data" folder before.
data seems to increase each time i restart windows.
Thanks for help.
This is how i install and index my data :
I do this only the first time :
- unzip elastic in folder : D:\work\elasticsearch-1.3.1\
- install web interface : >plugin -install mobz/elasticsearch-head
- install jdbc : >plugin --install jdbc --url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-river-jdbc/1.3.0.0/elasticsearch-river-jdbc-1.3.0.0-plugin.zip
- copy "ojdbc6-11.2.0.3.jar" to "D:\work\elasticsearch-1.3.1\plugins\jdbc"
- service.bat install
- service.bat start
creating index
curl -XPOST 'localhost:9200/donnees'
mapping :
curl -XPUT 'localhost:9200/donnees/specimens/_mapping' -d '{
"specimens" : {
"_all" : {"enabled" : true},
"_index" : {"enabled" : true},
"_id" : {"index": "not_analyzed", "store" : false},
"properties" : {
"O_OCCURRENCEID" : {"type" : "string", "store" : "no","index": "not_analyzed" } ,
....
"I_INSTITUTIONCODE" : {"type" : "string", "store" : "yes","index": "analyzed" }
}
}}'
query oracle and index data :
curl -XPUT 'localhost:9200/_river/donnees_s/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"index" : "donnees",
"type" : "specimens",
"url" : "jdbc:oracle:thin:@localhost:1523:recolnat",
"user" : "user",
"password" : "password",
"sql" : "select * from all_specimens_data"
}
}'
( is this correct ?? it doesn't work if i replace "curl -XPUT 'localhost:9200/_river/donnees_s/_meta'" by "curl -XPUT 'localhost:9200/donnees/specimens/_meta' which i use to query )
test :
curl -XGET 'http://localhost:9200/donnees/specimens/_count?q=*'
=> 12010
curl -XGET 'http://localhost:9200/donnees/specimens/_search?q=P00009359'
=> return data ok
Resolved thanks to Konstantin V. Salikhov.
Each time elasticsearch service start it query the database with the sql provided to the _river and get the data ( see me previous "query oracle and index data : "). If the data don't have an "_id" column _river can't determine which records it have already loaded and the data is duplicated each time. To avoid duplicate i edit my "all_specimens_data" table in database ( who is in fact a view to avoid modification o database) and rename "O_OCCURRENCEID" to "_id", "O_OCCURRENCEID" is my primary key UUID.
hope this help other
这篇关于弹性搜索数据增加每次重新启动时重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!