弹性搜索列的唯一过滤器无效(插入重复项目) [英] Unique Filter to Elastic Search Column not working (duplicate items inserted)
问题描述
我修改了我的 contactNumber
字段以具有唯一
过滤器
通过更新索引设置如下
curl -XPUT localhost:9200 / test-index2 / _settings - d'
{
index:{
analysis:{
analyzer:{
unique_keyword_analyzer:{
only_on_same_position :true,
filter:unique
}
}
}
},
mappings:{
business:{
properties:{
contactNumber:{
analyzer:unique_keyword_analyzer,
type:string
}
}
}
}
}'
一个样本项目看起来像这样,
doc_type:Business
/ pre>
contactNumber:(+ 12)415-3499
name:Sam's Pizza
地址:地球上的某个地方
过滤器不起作用,因为重复的项目被插入,我想要否两个具有相同contactNumber的文档
在上面,我也设置了
only_on_same_position
- >true
所以现有的重复值将被截断/删除
我在设置中做错了什么?
解决方案这是Elasticsearch无法帮助您开箱即用的东西...您需要在应用程序中提供此唯一性功能。我可以想到的唯一的想法是将电话号码作为文档本身的
_id
,每当您插入/更新某个ES时,都将使用contactNumber
as_id
,它会将该文档与已存在的文档或创建新文档相关联。
例如:
PUT / test-index2
{
:{
business:{
_id:{
path:contactNumber
},
properties:{
contactNumber:{
type:string,
analyzer:keyword
},
address:{
type :string
}
}
}
}
}
然后你索引一些东西:
POST / test-index2 / business
{
contactNumber:(+12)415-3499,
地址:无论123
}
Getti返回:
GET / test-index2 / business / _search
{
query {
match_all:{}
}
}
看起来像这样:
hits:{
total:1,
max_score :1,
hits:[
{
_index:test-index2,
_type:business,
_id :(+12)415-3499,
_score:1,
_source:{
contactNumber:(+12)415-3499,
地址:无论123
}
}
]
}
你看到文档的
_id
是电话号码本身。如果要更改或插入其他文档(地址不同,则有一个新字段 -whatever_field
- 但contactNumber
是相同的):POST / test-index2 / business
{
contactNumber :(+12)415-3499,
地址:无论123 456,
whatever_field:无论什么价值
}
Elasticserach更新现有文档并回复:
{
_index:test-index2,
_type:business,
_id:(+12)415-3499 ,
_version:2,
创建:false
}
创建
是false
,这意味着文档已更新,未创建。_version
是2
,再次表示该文档已更新。而_id
是电话号码本身,表示这是已更新的文件。
再次查看索引ES存储:
hits:[
{
_index: test-index2,
_type:business,
_id:(+12)415-3499,
_score:1,
_source:{
contactNumber:(+12)415-3499,
address:whatever 123 456,
whatever_field:whatever value
}
}
]
所以,新的字段在那里,地址已更改,
contactNumber
和_id
完全一样。I've modified my
contactNumber
field to have aunique
filterby updating the index settings as follows
curl -XPUT localhost:9200/test-index2/_settings -d ' { "index":{ "analysis":{ "analyzer":{ "unique_keyword_analyzer":{ "only_on_same_position":"true", "filter":"unique" } } } }, "mappings":{ "business":{ "properties":{ "contactNumber":{ "analyzer":"unique_keyword_analyzer", "type":"string" } } } } }'
A sample Item looks like this,
doc_type:"Business" contactNumber:"(+12)415-3499" name:"Sam's Pizza" address:"Somewhere on earth"
The Filter does not work, as duplicate items are inserted, I'd like NO two documents having the same contactNumber
in the above, I've also set
only_on_same_position
->true
so that existing duplicate values would be truncated/deletedWhat am i doing wrong in the settings?
解决方案That's something Elasticsearch couldn't help you out of the box... you need to make this uniqueness functionality available in your app. The only idea that I can think of is to have the phone number as the
_id
of the document itself and whenever you insert/update something ES will use thecontactNumber
as_id
and it will associate that document with the one that already exists or create a new one.For example:
PUT /test-index2 { "mappings": { "business": { "_id": { "path": "contactNumber" }, "properties": { "contactNumber": { "type": "string", "analyzer": "keyword" }, "address": { "type": "string" } } } } }
Then you index something:
POST /test-index2/business { "contactNumber": "(+12)415-3499", "address": "whatever 123" }
Getting it back:
GET /test-index2/business/_search { "query": { "match_all": {} } }
It looks like this:
"hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "test-index2", "_type": "business", "_id": "(+12)415-3499", "_score": 1, "_source": { "contactNumber": "(+12)415-3499", "address": "whatever 123" } } ] }
You see there that the
_id
of the document is the phone number itself. If you want to change or insert another document (the address is different, there is a new field -whatever_field
- but thecontactNumber
is the same):POST /test-index2/business { "contactNumber": "(+12)415-3499", "address": "whatever 123 456", "whatever_field": "whatever value" }
Elasticserach "updates" the existing document and responds back with:
{ "_index": "test-index2", "_type": "business", "_id": "(+12)415-3499", "_version": 2, "created": false }
created
isfalse
, this means the document has been updated, not created._version
is2
which again says that the document has been updated. And the_id
is the phone number itself which indicate this is the document that has been updated.Looking again in the index, ES stores this:
"hits": [ { "_index": "test-index2", "_type": "business", "_id": "(+12)415-3499", "_score": 1, "_source": { "contactNumber": "(+12)415-3499", "address": "whatever 123 456", "whatever_field": "whatever value" } } ]
So, the new field is there, the address has changed, the
contactNumber
and_id
are exactly the same.这篇关于弹性搜索列的唯一过滤器无效(插入重复项目)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!