应用分析器/过滤器时搜索elasticsearch错误 [英] search in elasticsearch errors when applying analyzer/filter

查看:46
本文介绍了应用分析器/过滤器时搜索elasticsearch错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经下载了包含技能分类法的onet数据集,并将其上传到了Elasticsearch中.在技​​能分类法中,有一些技能,例如c ++ 、. net,C#.我想给c#并且只获得c#的技能.通过检查一些链接,我已如下设置索引的映射和设置.

I have downloaded the onet dataset which comprise of skills taxonomy and I have uploaded it into a elasticsearch. In skills taxonomy there are some skills like c++, .net, C#. I want to give c# and get only c# in skills. by checking some links, I have set the mapping and settings of my index as below.

{
  "onnet_taxonomy": {
    "mappings": {
      "text": {
        "properties": {
          "Occupation": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "Skill": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "Skill Type": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "keywords": {
        "properties": {
          "Occupation": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "Skill": {
            "type": "text",
            "fields": {
              "analyzed": {
                "type": "text",
                "analyzer": "analyzer_keyword",
                "search_analyzer": "analyzer_shingle"
              },
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "Skill Type": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    },
    "settings": {
      "index": {
        "number_of_shards": "5",
        "provided_name": "onnet_taxonomy",
        "creation_date": "1583114276039",
        "analysis": {
          "filter": {
            "my_shingle_filter": {
              "max_shingle_size": "8",
              "min_shingle_size": "2",
              "output_unigrams": "true",
              "type": "shingle"
            }
          },
          "analyzer": {
            "analyzer_keyword": {
              "filter": [
                "lowercase"
              ],
              "char_filter": [
                "code_mapping"
              ],
              "type": "custom",
              "tokenizer": "keyword"
            },
            "analyzer_shingle": {
              "filter": [
                "lowercase",
                "my_shingle_filter"
              ],
              "char_filter": [
                "code_mapping"
              ],
              "tokenizer": "standard"
            }
          },
          "char_filter": {
            "code_mapping": {
              "type": "mapping",
              "mappings": [
                "++ => plusplus",
                "c# => csharp",
                "C# => csharp",
                "F# => fsharp",
                "f# => fsharp",
                ".net => dotnet",
                ".Net => dotnet",
                ".NET => dotnet",
                "( => map_lp",
                ") => map_rp",
                "& => and",
                "# => hash",
                "+ => plus"
              ]
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "LNf2frW1S8WmHSOJWVrvLA",
        "version": {
          "created": "5030399"
        }
      }
    }
  }
}

当我使用以下查询时

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "Skill": "c++"
          }
        }
      ]
    }
  },
  "size": 10

我正在掌握所有具有'c'的技能

i am getting all skills that have 'c'

当我使用以下假设分析器的查询时

when i use query as below assuming analyzer is applied

    {
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "Skill.analyzed": "c++"
          }
        }
      ]
    }
  },
  "size": 10
}

我得到空的输出.我是否正确包含了分析仪,还是我的查询错误?

I get empty output. did i include the analyzer correctly or is my query wrong ?

推荐答案

我只是简化了您的问题,为简单起见,我们假设您只有1个名为 title 的字段,其中包含诸如c c ++ c# f#.

I just simplified your question and for simplicity, let's assume you just have 1 field called title which contains different languages like c, c++, c# f#.

title 字段的索引设置和映射.

Index settings and mapping for this title field.

{
    "settings": {
        "index": {
            "analysis": {
                "analyzer": {
                    "my_analyzer": {
                        "filter": [
                            "lowercase"
                        ],
                        "char_filter": [
                            "code_mapping"
                        ],
                        "tokenizer": "standard" --> notice `standard`
                    }
                },
                "char_filter": {
                    "code_mapping": {
                        "type": "mapping",
                        "mappings": [
                            "++ => plusplus",
                            "c# => csharp",
                            "C# => csharp",
                            "F# => fsharp",
                            "f# => fsharp",
                            ".net => dotnet",
                            ".Net => dotnet",
                            ".NET => dotnet",
                            "( => map_lp",
                            ") => map_rp",
                            "& => and",
                            "# => hash",
                            "+ => plus"
                        ]
                    }
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "title": {
                "type": "text",
                "analyzer": "my_analyzer" --> using custom analyzer created in settings
            }
        }
    }
}

为一些文档编制索引

POST/_doc/{doc-is}

Index some docs

POST /_doc/{doc-is}

{
    "title": "c#"
}
{
    "title": "c++"
}
{
    "title": "c"
}
{
    "title": "F#"
}

搜索查询,它在您的问题中提供给您,该查询将提取包含 c 的所有记录.

Search query, which is provided you in your question which fetches all records which contains c.

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "title": "c++"
                    }
                }
            ]
        }
    },
    "size": 10
}

现在对我来说,它仅检索仅包含 c ++ 的文档,如我的搜索API结果所示.

For me now, it retuens only the documents which contains only c++ as shown in my search API result.

"hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.9808292,
        "hits": [
            {
                "_index": "cplus",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.9808292,
                "_source": {
                    "title": "c++"
                }
            }
        ]
    }

这篇关于应用分析器/过滤器时搜索elasticsearch错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆