弹力搜索雪球分析仪想要确切的词 [英] Elasticsearch Snowball Analyzer wants exact word

查看:158
本文介绍了弹力搜索雪球分析仪想要确切的词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用弹性搜索项目,但是我发现Snowball Analyzer的结果有点奇怪。

I Have been using Elastic Search for a project, but I find the result of Snowball Analyzer a bit strange.

下面是我使用的Mapping示例。 p>

Below is my example of Mapping used.

$myTypeMapping = array(
    '_source' => array(
        'enabled' => true
    ),
    'properties' => array(
        'id'    => array(
            'type'  => 'integer',
            'index' => 'not_analyzed'
        ),
        'name' => array(
            'type' => 'string',
            'analyzer' => 'snowball',
            'boost' => 2.0
        ),
        'food_types' => array(
            'type' => 'string',
            'analyzer' => 'keyword'
        ),
        'location' => array(
            'type' => 'geo_point',
            "geohash_precision"=> 4
        ),
        'city' => array(
            'type' => 'string',
            'analyzer' => 'keyword'
        )
    )
);
$indexParams['body']['mappings']['online_pizza'] = $myTypeMapping;

// Create the index

$elastic_client->indices()->create($indexParams);

在查询 http:// localhost:9200 / online_pizza / online_pizza / _maps 我得到以下结果,

On quering the http://localhost:9200/online_pizza/online_pizza/_mapping I get the following results,

    {
  "online_pizza": {
    "properties": {
      "city": {
        "type": "string",
        "analyzer": "keyword"
      },
      "food_types": {
        "type": "string",
        "analyzer": "keyword"
      },
      "id": {
        "type": "integer"
      },
      "location": {
        "type": "geo_point",
        "geohash_precision": 4
      },
      "name": {
        "type": "string",
        "boost": 2,
        "analyzer": "snowball"
      }
    }
  }
}

我的问题是,我有数据,其中名称字段为Milano。在查询米兰时,我会得到所需的结果,但是如果我查询米兰或米尔,我没有找到结果。

My Question is, I have data, which has Name field as "Milano". On querying for "Milano" I get the desired result, but if I query for "Milan" or "Mil" I get no result found.

 {
  "query": {
    "query_string": {
      "default_field": "name",
      "query": "Milan"
     }
   }
 }

我也在查询期间尝试了雪球分析仪没有帮助。

I've also tried to snowball analyzer during querying, no help.

{
  "query": {
    "query_string": {
      "default_field": "name",
      "query": "Milan",
      "analyzer": "snowball"
    }
  }
}

第二个问题是关键字搜索是区分大小写的,比如比萨饼=比萨饼,我该如何解决?

Second Question is Keyword Search is case sensitive, eg, Pizza != pizza, how do i get away with this ?

谢谢,

推荐答案

stemmer不想要确切的单词。如果您尝试使用 jump ,则会根据预期输出 jump

The snowball stemmer doesn't want exact words. If you try it with jumping, it outputs jump as expected.

但是,根据情况,您的单词可能会不适合,因为它不符合任何干扰规则。

However, depending on the case, you word may be understemmed as it doesn't match any stemmer rule.

如果您使用分析 API端点(更多信息这里),您将看到分析 Milano snowball 分析仪可以让您令牌 milano

If you use the analyze API endpoint (more info here), you will see that analyzing Milano with snowball analyzer gives you the token milano :

GET _analyze?analyzer=snowball&text=Milano

输出:

{
   "tokens": [
      {
         "token": "milano",
         "start_offset": 0,
         "end_offset": 6,
         "type": "<ALPHANUM>",
         "position": 1
      }
   ]
}

然后,在<$ c上使用相同的雪球分析仪$ c> Mil 如下:

GET _analyze?analyzer=snowball&text=Mil

给你这个标记:

{
   "tokens": [
      {
         "token": "mil",
         "start_offset": 0,
         "end_offset": 3,
         "type": "<ALPHANUM>",
         "position": 1
      }
   ]
}

这就是为什么搜索milan或mil不符合Milano文档:它与 milano 术语存储在索引中。

That's why searching for 'milan' or 'mil' won't match 'Milano' documents : it doesn't match the milano term stored in index.

对于您的第二个问题,您可以准备一个自定义分析器结合关键字 tokenizer和小写 tokenfilter,以使您的关键字搜索不区分大小写(如果您在搜索时使用相同的分析器):

For your second question, you can prepare a custom analyzer combining keyword tokenizer and a lowercase tokenfilter in order to have your keyword search case-insensitive (if you use the same analyzer at search time) :

POST index_name
{
  "analysis": {
   "analyzer": {
     "case_insensitive_keyword": {
       "type": "custom",
       "tokenizer": "keyword",
       "filter": ["lowercase"]
     }
   }
  }
}

测试:

GET analyse/_analyze?analyzer=case_insensitive_keyword&text=Choo Choo

输出:

{
   "tokens": [
      {
         "token": "choo choo",
         "start_offset": 0,
         "end_offset": 9,
         "type": "word",
         "position": 1
      }
   ]
}

我希望我我的解释很清楚:)

I hope I'm clear enough in my explainations :)

这篇关于弹力搜索雪球分析仪想要确切的词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆