ngram令牌过滤器与ngram令牌过滤器有何不同? [英] how edge ngram token filter differs from ngram token filter?
问题描述
由于我是弹性搜索的新手,我无法识别 ngram令牌过滤器和
边缘ngram令牌过滤器之间的区别。
As I am new to elastic search, I am not able to identify difference between ngram token filter and edge ngram token filter.
在
处理令牌中,这两个不同之处如何?
How these two differ from each other in processing tokens?
推荐答案
我认为文档是非常清楚的:
这个分类器非常类似于nGram,但只保留从一开始就开始的n-gram令牌。
This tokenizer is very similar to nGram but only keeps n-grams which start at the beginning of a token.
而 nGram
tokenizer的最佳示例再次来自< a href =https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-ngram-tokenizer.html =noreferrer>文档:
And the best example for nGram
tokenizer again comes from the documentation:
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=my_ngram_analyzer' -d 'FC Schalke 04'
# FC, Sc, Sch, ch, cha, ha, hal, al, alk, lk, lke, ke, 04
使用此分类器定义:
"type" : "nGram",
"min_gram" : "2",
"max_gram" : "3",
"token_chars": [ "letter", "digit" ]
简而言之:
- 根据配置,tokenizer将创建令牌。在这个例子中:
FC
,Schalke
,04
/ li>
-
nGram
生成最小min_gram
大小和最大max_gram
大小从输入文本。基本上,令牌被分割成小块,每个块都被固定在一个角色上(这个角色无关紧要,所有这些都会创建块)。 -
edgeNGram
执行相同操作,但这些块总是从每个令牌的开头开始。基本上,这些块被固定在标记的开头。
- the tokenizer, depending on the configuration, will create tokens. In this example:
FC
,Schalke
,04
. nGram
generates groups of characters of minimummin_gram
size and maximummax_gram
size from an input text. Basically, the tokens are split into small chunks and each chunk is anchored on a character (it doesn't matter where this character is, all of them will create chunks).edgeNGram
does the same but the chunks always start from the beginning of each token. Basically, the chunks are anchored at the beginning of the tokens.
对于与上述相同的文本, edgeNGram
生成: FC,Sc,Sch,Scha,Schal,04
。考虑文本中的每个单词,对于每个单词,第一个字符是起始点( F
从 FC $ c $来自
Schalke
和 0
从 04
)。
For the same text as above, an edgeNGram
generates this: FC, Sc, Sch, Scha, Schal, 04
. Every "word" in the text is considered and for every "word" the first character is the starting point (F
from FC
, S
from Schalke
and 0
from 04
).
这篇关于ngram令牌过滤器与ngram令牌过滤器有何不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!