弹力搜索雪球分析仪想要确切的词 [英] Elasticsearch Snowball Analyzer wants exact word
问题描述
我一直在使用弹性搜索项目,但是我发现Snowball Analyzer的结果有点奇怪。
I Have been using Elastic Search for a project, but I find the result of Snowball Analyzer a bit strange.
下面是我使用的Mapping示例。 p>
Below is my example of Mapping used.
$myTypeMapping = array(
'_source' => array(
'enabled' => true
),
'properties' => array(
'id' => array(
'type' => 'integer',
'index' => 'not_analyzed'
),
'name' => array(
'type' => 'string',
'analyzer' => 'snowball',
'boost' => 2.0
),
'food_types' => array(
'type' => 'string',
'analyzer' => 'keyword'
),
'location' => array(
'type' => 'geo_point',
"geohash_precision"=> 4
),
'city' => array(
'type' => 'string',
'analyzer' => 'keyword'
)
)
);
$indexParams['body']['mappings']['online_pizza'] = $myTypeMapping;
// Create the index
$elastic_client->indices()->create($indexParams);
在查询 http:// localhost:9200 / online_pizza / online_pizza / _maps
我得到以下结果,
On quering the http://localhost:9200/online_pizza/online_pizza/_mapping
I get the following results,
{
"online_pizza": {
"properties": {
"city": {
"type": "string",
"analyzer": "keyword"
},
"food_types": {
"type": "string",
"analyzer": "keyword"
},
"id": {
"type": "integer"
},
"location": {
"type": "geo_point",
"geohash_precision": 4
},
"name": {
"type": "string",
"boost": 2,
"analyzer": "snowball"
}
}
}
}
我的问题是,我有数据,其中名称
字段为Milano。在查询米兰时,我会得到所需的结果,但是如果我查询米兰或米尔,我没有找到结果。
My Question is, I have data, which has Name
field as "Milano". On querying for "Milano" I get the desired result, but if I query for "Milan" or "Mil" I get no result found.
{
"query": {
"query_string": {
"default_field": "name",
"query": "Milan"
}
}
}
我也在查询期间尝试了雪球分析仪没有帮助。
I've also tried to snowball analyzer during querying, no help.
{
"query": {
"query_string": {
"default_field": "name",
"query": "Milan",
"analyzer": "snowball"
}
}
}
第二个问题是关键字搜索是区分大小写的,比如比萨饼=比萨饼,我该如何解决?
Second Question is Keyword Search is case sensitive, eg, Pizza != pizza, how do i get away with this ?
谢谢,
推荐答案
stemmer不想要确切的单词。如果您尝试使用 jump
,则会根据预期输出 jump
。
The snowball
stemmer doesn't want exact words. If you try it with jumping
, it outputs jump
as expected.
但是,根据情况,您的单词可能会不适合,因为它不符合任何干扰规则。
However, depending on the case, you word may be understemmed as it doesn't match any stemmer rule.
如果您使用分析
API端点(更多信息这里),您将看到分析 Milano
与 snowball
分析仪可以让您令牌 milano
:
If you use the analyze
API endpoint (more info here), you will see that analyzing Milano
with snowball
analyzer gives you the token milano
:
GET _analyze?analyzer=snowball&text=Milano
输出:
{
"tokens": [
{
"token": "milano",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 1
}
]
}
然后,在<$ c上使用相同的雪球分析仪$ c> Mil 如下:
GET _analyze?analyzer=snowball&text=Mil
给你这个标记:
{
"tokens": [
{
"token": "mil",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 1
}
]
}
这就是为什么搜索milan或mil不符合Milano文档:它与 milano
术语存储在索引中。
That's why searching for 'milan' or 'mil' won't match 'Milano' documents : it doesn't match the milano
term stored in index.
对于您的第二个问题,您可以准备一个自定义
分析器结合关键字
tokenizer和小写
tokenfilter,以使您的关键字搜索不区分大小写(如果您在搜索时使用相同的分析器):
For your second question, you can prepare a custom
analyzer combining keyword
tokenizer and a lowercase
tokenfilter in order to have your keyword search case-insensitive (if you use the same analyzer at search time) :
POST index_name
{
"analysis": {
"analyzer": {
"case_insensitive_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
}
测试:
GET analyse/_analyze?analyzer=case_insensitive_keyword&text=Choo Choo
输出:
{
"tokens": [
{
"token": "choo choo",
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 1
}
]
}
我希望我我的解释很清楚:)
I hope I'm clear enough in my explainations :)
这篇关于弹力搜索雪球分析仪想要确切的词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!