弹性搜索- search_analyzer 与 index_analyzer [英] Elastic search- search_analyzer vs index_analyzer

查看:24
本文介绍了弹性搜索- search_analyzer 与 index_analyzer的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在看http://euphonious-intuition.com/2012/08/more-complicated-mapping-in-elasticsearch/这解释了 ElasticSearch 分析器.

I was looking at http://euphonious-intuition.com/2012/08/more-complicated-mapping-in-elasticsearch/ which explains ElasticSearch analyzers.

我不明白有关拥有不同搜索和索引分析器的部分.自定义映射的第二个例子是这样的:
->索引分析器是一个edgeNgram
->搜索分析器是:

I did not understand the part about having different search and index analyzers. The second example of custom mapping goes like this:
->the index analyzer is an edgeNgram
->the search analyzer is:

"full_name":{
    "filter":[
        "standard",
        "lowercase",
        "asciifolding"
    ],
    "type":"custom",
    "tokenizer":"standard"
}

如果我们希望查询 "Race" 由于 edgeNgram 不返回像 *ra*pport 和 *rac*ial 这样的结果,为什么在第一名?

if we wanted the query "Race" to not return results like *ra*pport and *rac*ial due to edgeNgram, why index it with edgeNgram in the first place?

请举例说明不同分析器的用处.

Please explain with an example where different analyzers are useful.

推荐答案

通常在索引时间和查询时间都有相似的分析链.相似并不意味着完全相同,但通常您索引文档的方式反映了您查询它们的方式.

You usually have similar analysis chain at both index time and query time. Similar doesn't mean exactly the same, but usually the way you index documents reflects the way you query them.

ngrams 示例非常适合,因为它是您在索引和查询时使用不同分析器的主要原因之一.

The ngrams example is a really good fit though, since it's one of the main reasons why you would use different analyzers at index and query time.

对于部分匹配,您使用边缘 ngram 进行索引,以便elasticsearch"变为(使用 mingram 3 和 maxgram 20):

For partial matches you index with edge ngrams, so that "elasticsearch" becomes (with mingram 3 and maxgram 20):

ela"、elas"、elast"、elastic"、elastic"、elastics"、elasticse"、elasticsea"、elasticsear"、eleasticsearc"和elasticsearch"

"ela", "elas","elast","elasti","elastic","elastics","elasticse","elasticsea","elasticsear","eleasticsearc" and "elasticsearch"

现在让我们查询创建的字段.如果我们查询术语弹性",就会有匹配项,我们会得到预期的结果.给定我们索引的内容,我们基本上使上面所说的部分匹配成为完全匹配.也不需要将 ngrams 应用于查询.如果我们这样做,我们将查询以下所有术语:

Let's now query the created field. If we query for the term "elastic" there's a match and we get back the expected result. We basically made become what we called above partial match an exact match, given what we indexed. There's no need to apply ngrams to the query too. If we did we would query for all the following terms:

ela"、elas"、elast"、elasti"和elastic"

"ela", "elas","elast","elasti" and "elastic"

这会使查询方式更加复杂,也会导致得到奇怪的结果.假设您在另一个文档的同一字段中对术语已用"进行索引.您将拥有以下 ngram:

That would make the query way more complex and would lead to get weird results as well. Let's say you index the term "elapsed" in another document, same field. You would have the following ngrams:

"ela", "elap", "elaps", "elapse", "elapsed"

"ela", "elap", "elaps", "elapse", "elapsed"

如果您搜索elastic"并为查询生成 ngram,术语ela"也将匹配第二个文档,因此即使没有术语包含整个elastic",您也会将其与第一个文档重新组合"您正在寻找的术语.

If you search for "elastic" and make ngrams to the query, the term "ela" would match this second document too, thus you would get it back together with the first document, even though no terms contain the whole "elastic" term you were looking for.

我建议你看看分析api 使用不同的分析器及其不同的结果.

I would suggest you to have a look at the analyze api to play around around with different analyzer and their different results.

这篇关于弹性搜索- search_analyzer 与 index_analyzer的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆