如何在Elasticsearch上搜索带或不带撇号的单词?并处理拼写错误? [英] How to search on Elasticsearch for words with or without apostrophe ? and deal with spelling mistakes?

查看:57
本文介绍了如何在Elasticsearch上搜索带或不带撇号的单词?并处理拼写错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将全文搜索逻辑从MySQL转移到Elasticsearch.在MySQL中查找包含"woman"一词的所有行,我只写

I'm trying to move my Full Text Search logic from MySQL to Elasticsearch. In MySQL to find all rows containing the word "woman" I would just write

SELECT b.code
FROM BIBLE b 
WHERE ((b.DISPLAY_NAME LIKE '%woman%')
 OR (b.BRAND LIKE '%woman%')
 OR (b.DESCRIPTION LIKE '%woman%'));

在elasticsearch上,我尝试过类似的事情

on elasticsearch I tried for something similar

curl -X GET "localhost:9200/bible/_search" -H 'Content-Type: application/json' -d'
{
  "query": { "multi_match": { "query": "WOMAN","fields": ["description","display_name","brand"] } }, "sort": { "code": {"order": "asc" } },"_source":["code"]
}
'

但是在进一步检查时没有相同的计数,我发现诸如 woman's 之类的词不是由Elasticsearch找到的,而是由MySQL发现的.我该如何解决?

but it didn't have the same count on further checking it I found words like woman's weren't found by elasticsearch but was by MySQL. How do I solve this ?

AND

我如何合并诸如搜索拼写错误或单词在语音上相同的单词之类的东西?

How do I incorporate stuff like searching for words even with spelling mistakes or words which are phonetically the same ?

推荐答案

在elasticsearch中,必须在对数据建立索引之前对字段进行映射.映射是一种通知Elasticsearch以特定方式对数据建立索引的方式,以便以您想要的方式检索数据.

In elasticsearch, you have to do the mapping for the fields before indexing the data. Mapping is the way for informing elasticsearch to index the data in a particular way for retrieving the data the way you want.

尝试以下DSL查询(JSON格式)以创建自定义分析器和映射:

Try the below DSL query (JSON format) for creating custom analyzer and mapping:

PUT {YOUR_INDEX_NAME}
{
 "settings": {
   "analysis": {
    "analyzer": {
     "my_analyzer": {
       "tokenizer": "my_tokenizer"
     }
   },
   "tokenizer": {
     "my_tokenizer": {
       "type": "ngram",
       "min_gram": 3,
       "max_gram": 20,
       "token_chars": [
         "letter",
         "digit"
       ]
     }
   }
 },
 "max_ngram_diff": 20 //For Elasticsearch v6 and above
},
"mappings": {
 "properties": {
   "code": {"type": "long"},
   "description": {
     "type": "text",
     "analyzer": "my_analyzer"
   },
   "display_name": {
     "type": "text",
     "analyzer": "my_analyzer"
   },
   "brand": {
     "type": "text",
     "analyzer": "my_analyzer"
   }
  }
 }
}

示例查询示例:

GET {YOUR_INDEX_NAME}/_search
{
  "query": {
    "multi_match" : {
      "query" : "women",
      "fields" : [ "description^3", "display_name", "brand" ] 
    }
  }
}

我建议您看一下模糊查询中的拼写错误.

I suggest you take a look at the fuzzy query for spelling mistakes.

尝试使用Kibana UI通过DSL查询而不是cURL来测试索引,这将节省您的时间.

Try to use Kibana UI for testing the index using DSL query instead of using cURL which will save you time.

希望它对您有帮助.

这篇关于如何在Elasticsearch上搜索带或不带撇号的单词?并处理拼写错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆