弹性搜索数据增加每次重新启动时重复 [英] elasticsearch data increase & duplicate at each restart

查看：212 发布时间：2017/8/6 23:04:47 elasticsearch

本文介绍了弹性搜索数据增加每次重新启动时重复的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在windows7上使用angularjs和oracle弹性搜索。
它的工作更多&更精细（谢谢stackoverflower帮助）。我有一个弹性搜索的问题：我的文档中的元素数量在增加，我不知道为什么/如何。
由弹性搜索索引的我的oracle表包含12010个元素，现在我在弹性文档中有84070个元素（经常由curl _count检查）：所以它现在复制了7次数据。我几天前重新编入了表格，但我删除了elasticsearchdata文件夹。

数据似乎每次重新启动Windows时都会增加。

感谢您的帮助。

这是我如何安装和索引我的数据：

我第一次这样做：

解压缩文件夹中的弹性：D ：\work\elasticsearch-1.3.1\

安装web界面：> plugin -install mobz / elasticsearch-head

安装jdbc ：> plugin --install jdbc --url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-river-jdbc/1.3.0.0/elasticsearch-river-jdbc-1.3.0.0 -plugin.zip

将ojdbc6-11.2.0.3.jar复制到D：\work\elasticsearch-1.3.1\plugins\jdbc

service.bat install

service.bat start

创建索引

  curl -XPOST'localhost：9200 / donnees'

映射：

  curl -XPUT'localhost：9200 / donnees / specimens / _mapping'-d {
{
_all：{enabled：true}，
_index：{enabled：true}，
_id ：{index：not_analyzed，store：false}，
properties：{
O_OCCURRENCEID：{type：string，store ，index：not_analyzed}，
 .... 
I_INSTITUTIONCODE：{type：string，store：yes，index } 
} 
}}'

查询oracle和索引数据： p>

curl -XPUT'localhost：9200 / _river / donnees_s / _meta'-d'{ type：jdbc ， jdbc：{ index：donnee s， type：samples， url：jdbc：oracle：thin：@localhost：1523：recolnat， user password：password， sql：select * from all_specimens_data } }' pre>

（这是正确的吗？如果我用curl -XPUT'localhost：9200 / donnees / samples / _meta替换curl -XPUTlocalhost：9200 / _river / donnees_s / _meta，则不起作用。

测试：

  curl -XGET'http：// localhost：9200 / donnees / samples / _count？q = *'
 => 12010 
 curl -XGET'http：// localhost：9200 / donnees / specimens / _search？q = P00009359'
 =>返回数据ok

解决方案

感谢Konstantin V. Salikhov。

每次弹性搜索服务启动时，将使用提供给_river的sql查询数据库，并获取数据（请参阅我之前的查询oracle和索引数据：）如果数据没有_id列_river不能确定它已经加载了哪些记录，每次都重复数据
为了避免重复，我编辑数据库中的all_specimens_data表（谁在事实上，避免修改o数据库）和rena我O_OCCURRENCEID到_id，O_OCCURRENCEID是我的主键UUID。

希望这个帮助其他

I'm using elasticsearch with angularjs and oracle on windows 7. it's working more & more finer ( thanks to stackoverflower help ). I have a problem with elasticsearch: the number of elements in my document is increasing and i don't know why/how. My oracle table indexed by elasticsearch contain 12010 elements, now i got 84070 elements in elastic document (frequently checked by curl _count): so it duplicate the data 7 times now. I re-indexed the table few days ago but i remove elasticsearch "data" folder before.

data seems to increase each time i restart windows.

Thanks for help.

This is how i install and index my data :

I do this only the first time :

unzip elastic in folder : D:\work\elasticsearch-1.3.1\
install web interface : >plugin -install mobz/elasticsearch-head
install jdbc : >plugin --install jdbc --url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-river-jdbc/1.3.0.0/elasticsearch-river-jdbc-1.3.0.0-plugin.zip
copy "ojdbc6-11.2.0.3.jar" to "D:\work\elasticsearch-1.3.1\plugins\jdbc"
service.bat install
service.bat start

creating index

curl -XPOST 'localhost:9200/donnees'

mapping :

curl -XPUT 'localhost:9200/donnees/specimens/_mapping' -d '{
"specimens" : {
    "_all" : {"enabled" : true},
    "_index" : {"enabled" : true},
    "_id" : {"index": "not_analyzed", "store" : false},
    "properties" : {
        "O_OCCURRENCEID"                                : {"type" : "string",   "store" : "no","index": "not_analyzed"  } ,
            .... 
        "I_INSTITUTIONCODE"                             : {"type" : "string",   "store" : "yes","index": "analyzed" } 
    }
}}'

query oracle and index data :

curl -XPUT 'localhost:9200/_river/donnees_s/_meta' -d '{
 "type" : "jdbc",
 "jdbc" : {
    "index" : "donnees",
    "type" : "specimens",
    "url" : "jdbc:oracle:thin:@localhost:1523:recolnat",
     "user" : "user",
     "password" : "password",
     "sql" : "select * from all_specimens_data"
   }
}'

( is this correct ?? it doesn't work if i replace "curl -XPUT 'localhost:9200/_river/donnees_s/_meta'" by "curl -XPUT 'localhost:9200/donnees/specimens/_meta' which i use to query )

test :

curl -XGET 'http://localhost:9200/donnees/specimens/_count?q=*'
    => 12010
curl -XGET 'http://localhost:9200/donnees/specimens/_search?q=P00009359'
    => return data ok

解决方案

Resolved thanks to Konstantin V. Salikhov.

Each time elasticsearch service start it query the database with the sql provided to the _river and get the data ( see me previous "query oracle and index data : "). If the data don't have an "_id" column _river can't determine which records it have already loaded and the data is duplicated each time. To avoid duplicate i edit my "all_specimens_data" table in database ( who is in fact a view to avoid modification o database) and rename "O_OCCURRENCEID" to "_id", "O_OCCURRENCEID" is my primary key UUID.

hope this help other

这篇关于弹性搜索数据增加每次重新启动时重复的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

弹性搜索数据增加每次重新启动时重复 [英] elasticsearch data increase & duplicate at each restart

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

弹性搜索数据增加每次重新启动时重复 [英] elasticsearch data increase &amp; duplicate at each restart

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

弹性搜索数据增加每次重新启动时重复 [英] elasticsearch data increase & duplicate at each restart

登录关闭