在Elasticsearch中使用同义词 [英] Use synonyms in elasticsearch

查看:130
本文介绍了在Elasticsearch中使用同义词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在搜索实现中实现同义词文件.我发现许多文档是实现该准则的指南,但最终没有一个可行的解决方案.

I am trying to implement synonyms file in my search implementation. I found many documents as guidelines for implementing that but could not end up with a working solution.

首先,我添加了以下分析器:

First, I have added the analyzer as follows:

PUT /products/_settings
{
    "settings": {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "synonym" : {
                        "tokenizer" : "whitespace",
                        "filter" : ["synonym"]
                    }
                },
                "filter" : {
                    "synonym" : {
                        "type" : "synonym",
                        "synonyms_path" : "analysis/synonym.txt"
                    }
                }
            }
        }
    }
}

然后,我尝试使用此同义词分析器,如下所示:

Then I was trying to use this synonym analyzer like the following:

GET products/_search
{
    "query": {
        "multi_match": {
            "query": "television",
            "fields": ["prd_name","brand_name", "prd_sdescription"],
            "analyzer": "synonym"
        }
    }
}

我的别名是solr格式,例如:

I have the synonym as solr format like:

GB,gib,gigabyte,gigabytes
MB,mib,megabyte,megabytes
Television, Televisions, TV, TVs

但是当我有电视记录时,搜索对电视一无所获.

But the search gets nothing for television while I have records for TV.

我还尝试搜索同义词匹配的单个字段(prd_name),因为当我想更改prd_name字段的映射时,我需要更改,但会出错.

I also tried to search single field (prd_name) with synonym match, for that I need to change the but getting error when I want to change the mapping of the prd_name field.

"type": "illegal_argument_exception",
"reason": "Mapper for [prd_name] conflicts with existing mapping in other types:\n[mapper [prd_name] has different [analyzer]]"

prd_name的当前映射为:

The current mapping of prd_name is:

"prd_name": {
    "type": "text",
    "fields": {
      "keyword": {
        "type": "keyword",
        "ignore_above": 256
      }
    }
}

如果有人可以给我逐步解决方案,那对我来说将是一个很好的选择.

If someone could give me step by step solution that would be great for me.

Elasticsearch version: 6.4.1

推荐答案

解决方案1:要求更改创建的自定义分析器并将其添加到字段名称中

要点1:在映射中添加小写标记器

空白标记生成器不会将标记转换为小写.在同义词列表中,您添加了'Television',但您正在查询'television'.

Whitespace tokenizer would not convert the tokens into lowercase. And in the list of synonyms, you have added 'Television' but you are querying 'television'.

添加小写令牌过滤器在您的映射中,如下所示,您的查询将为您提供预期的结果.

Add Lowercase Token Filter in your mapping as follows and your query would give you the expected result.

第2点:将分析器添加到字段名称

检查prd_name,就像我在下面的映射中的创建方式一样.请注意,我已经向其中添加了分析器.

Check the prd_name as how I've created in the below mapping. Notice that I've added analyzer to it.

映射

PUT products
{  
   "settings":{  
      "index":{  
         "analysis":{  
            "analyzer":{  
               "synonym":{  
                  "tokenizer":"whitespace",
                  "filter":[  
                     "synonym",
                     "lowercase"
                  ]
               }
            },
            "filter":{  
               "synonym":{  
                  "type":"synonym",
                  "synonyms_path":"analysis/synonym.txt"
               }
            }
         }
      }
   },
   "mappings":{  
      "mydocs":{  
         "properties":{  
            "prd_name":{  
               "type":"text",
               "analyzer":"synonym",
               "fields":{  
                  "keyword":{  
                     "type":"keyword",
                     "ignore_above":256
                  }
               }
            }
         }
      }
   }
}

请注意,任何映射更改都将要求您重新创建索引并再次嵌入文档.

Note that any mapping change would require you to recreate the index and ingest the documents again.

确保您的同义词文件在所有节点中都可用.

Make sure that your synonym file is available in all the nodes.

让我知道是否有帮助.

Let me know if it helps.

如果无法更改字段的映射,可以做的是在设置

In case if you are not able to change the mapping of the field, what you can do is, create an analyzer with name default in the settings

基本上使用与分析仪synonym完全相同的设置重新定义default分析仪.

Basically redefining the default analyzer with settings exactly as analyzer synonym.

那样,它将是将使用的默认分析器,而不是standard分析器,并且不需要更改字段名称的映射.

That way it would be the default analyzer that'd be used instead of standard analyzer and requires no change in the mapping of field name.

下面是这种情况下的映射方式.

Below is how the mapping would be in that case.

映射

PUT <your_index_name>
{  
   "settings":{  
      "index":{  
         "analysis":{  
            "analyzer":{  
               "default":{  
                  "tokenizer":"whitespace",
                  "filter":[  
                     "synonym",
                     "lowercase"
                  ]
               }
            },
            "filter":{  
               "synonym":{  
                  "type":"synonym",
                  "synonyms_path":"analysis/synonym.txt"
               }
            }
         }
      }
   },
   "mappings":{  
      "mydocs":{  
         "properties":{  
            "prd_name":{  
               "type":"text",
               "fields":{  
                  "keyword":{  
                     "type":"keyword",
                     "ignore_above":256
                  }
               }
            }
         }
      }
   }
}

请注意,我在映射中只是将synonym重命名为default.

Note that I've simply renamed synonym to default in the mapping.

重要说明::您需要以某种方式为所有数据重新编制索引,以使更改生效.如果确实需要更改映射,重新索引所有数据并且可以执行任何需要字段名称的操作,那么我强烈建议使用解决方案1.

Important Note: Somehow you would need to reindex all the data for the changes to be in effect. In case if you do come up with having to change mapping, reindex all data, and you can do anything you'd want w.r.t field names, then I strongly suggest solution 1.

让我知道这是否有帮助:)

Let me know if this helps :)

这篇关于在Elasticsearch中使用同义词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆