弹性搜索列的唯一过滤器无效(插入重复项目) [英] Unique Filter to Elastic Search Column not working (duplicate items inserted)

查看:92
本文介绍了弹性搜索列的唯一过滤器无效(插入重复项目)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我修改了我的 contactNumber 字段以具有唯一过滤器



通过更新索引设置如下

  curl -XPUT localhost:9200 / test-index2 / _settings  - d'
{
index:{
analysis:{
analyzer:{
unique_keyword_analyzer:{
only_on_same_position :true,
filter:unique
}
}
}
},
mappings:{
business:{
properties:{
contactNumber:{
analyzer:unique_keyword_analyzer,
type:string
}
}
}
}
}'

一个样本项目看起来像这样,

  doc_type:Business

contactNumber:(+ 12)415-3499
name:Sam's Pizza
地址:地球上的某个地方
/ pre>

过滤器不起作用,因为重复的项目被插入,我想要两个具有相同contactNumber的文档



在上面,我也设置了 only_on_same_position - > true 所以现有的重复值将被截断/删除



我在设置中做错了什么?

解决方案

这是Elasticsearch无法帮助您开箱即用的东西...您需要在应用程序中提供此唯一性功能。我可以想到的唯一的想法是将电话号码作为文档本身的 _id ,每当您插入/更新某个ES时,都将使用 contactNumber as _id ,它会将该文档与已存在的文档或创建新文档相关联。



例如:

  PUT / test-index2 
{
:{
business:{
_id:{
path:contactNumber
},
properties:{
contactNumber:{
type:string,
analyzer:keyword
},
address:{
type :string
}
}
}
}
}

然后你索引一些东西:

  POST / test-index2 / business 
{
contactNumber:(+12)415-3499,
地址:无论123
}

Getti返回:

  GET / test-index2 / business / _search 
{
query {
match_all:{}
}
}

看起来像这样:

 hits:{
total:1,
max_score :1,
hits:[
{
_index:test-index2,
_type:business,
_id :(+12)415-3499,
_score:1,
_source:{
contactNumber:(+12)415-3499,
地址:无论123
}
}
]
}

你看到文档的 _id 是电话号码本身。如果要更改或插入其他文档(地址不同,则有一个新字段 - whatever_field - 但 contactNumber 是相同的):

  POST / test-index2 / business 
{
contactNumber :(+12)415-3499,
地址:无论123 456,
whatever_field:无论什么价值
}

Elasticserach更新现有文档并回复:

  {
_index:test-index2,
_type:business,
_id:(+12)415-3499 ,
_version:2,
创建:false
}

创建 false ,这意味着文档已更新,未创建。 _version 2 ,再次表示该文档已更新。而 _id 是电话号码本身,表示这是已更新的文件。



再次查看索引ES存储:

 hits:[
{
_index: test-index2,
_type:business,
_id:(+12)415-3499,
_score:1,
_source:{
contactNumber:(+12)415-3499,
address:whatever 123 456,
whatever_field:whatever value
}
}
]

所以,新的字段在那里,地址已更改, contactNumber _id 完全一样。


I've modified my contactNumber field to have a unique filter

by updating the index settings as follows

curl -XPUT localhost:9200/test-index2/_settings -d '
{
     "index":{
        "analysis":{
           "analyzer":{
              "unique_keyword_analyzer":{
         "only_on_same_position":"true",
                 "filter":"unique"
              }
           }
        }
  },
  "mappings":{
     "business":{
        "properties":{
           "contactNumber":{
              "analyzer":"unique_keyword_analyzer",
              "type":"string"
           }
        }
     }
  }
}'

A sample Item looks like this,

doc_type:"Business"

contactNumber:"(+12)415-3499"
name:"Sam's Pizza"
address:"Somewhere on earth"

The Filter does not work, as duplicate items are inserted, I'd like NO two documents having the same contactNumber

in the above, I've also set only_on_same_position -> true so that existing duplicate values would be truncated/deleted

What am i doing wrong in the settings?

解决方案

That's something Elasticsearch couldn't help you out of the box... you need to make this uniqueness functionality available in your app. The only idea that I can think of is to have the phone number as the _id of the document itself and whenever you insert/update something ES will use the contactNumber as _id and it will associate that document with the one that already exists or create a new one.

For example:

PUT /test-index2
{
  "mappings": {
    "business": {
      "_id": {
        "path": "contactNumber"
      }, 
      "properties": {
        "contactNumber": {
          "type": "string",
          "analyzer": "keyword"
        },
        "address": {
          "type": "string"
        }
      }
    }
  }
}

Then you index something:

POST /test-index2/business
{
  "contactNumber": "(+12)415-3499",
  "address": "whatever 123"
}

Getting it back:

GET /test-index2/business/_search
{
  "query": {
    "match_all": {}
  }
}

It looks like this:

   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "test-index2",
            "_type": "business",
            "_id": "(+12)415-3499",
            "_score": 1,
            "_source": {
               "contactNumber": "(+12)415-3499",
               "address": "whatever 123"
            }
         }
      ]
   }

You see there that the _id of the document is the phone number itself. If you want to change or insert another document (the address is different, there is a new field - whatever_field - but the contactNumber is the same):

POST /test-index2/business
{
  "contactNumber": "(+12)415-3499",
  "address": "whatever 123 456",
  "whatever_field": "whatever value"
}

Elasticserach "updates" the existing document and responds back with:

{
   "_index": "test-index2",
   "_type": "business",
   "_id": "(+12)415-3499",
   "_version": 2,
   "created": false
}

created is false, this means the document has been updated, not created. _version is 2 which again says that the document has been updated. And the _id is the phone number itself which indicate this is the document that has been updated.

Looking again in the index, ES stores this:

  "hits": [
     {
        "_index": "test-index2",
        "_type": "business",
        "_id": "(+12)415-3499",
        "_score": 1,
        "_source": {
           "contactNumber": "(+12)415-3499",
           "address": "whatever 123 456",
           "whatever_field": "whatever value"
        }
     }
  ]

So, the new field is there, the address has changed, the contactNumber and _id are exactly the same.

这篇关于弹性搜索列的唯一过滤器无效(插入重复项目)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆