Solr中StandardTokenizerFactory和KeywordTokenizerFactory之间的区别? [英] Difference between StandardTokenizerFactory and KeywordTokenizerFactory in Solr?

查看:787
本文介绍了Solr中StandardTokenizerFactory和KeywordTokenizerFactory之间的区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Solr的新手。我想知道何时使用 StandardTokenizerFactory KeywordTokenizerFactory

I am new to Solr.I want to know when to use StandardTokenizerFactory and KeywordTokenizerFactory?

I阅读Apache Wiki上的文档,但我没有得到它。

I read the docs on Apache Wiki, but I am not getting it.

有人可以解释StandardTokenizerFactory和KeywordTokenizerFactory 之间的区别吗?

Can anybody explain the difference between StandardTokenizerFactory and KeywordTokenizerFactory?

推荐答案

StandardTokenizerFactory: -

它在空格上进行标记,以及剥离字符

StandardTokenizerFactory :-
It tokenizes on whitespace, as well as strips characters

文档: -


在标点字符处拆分单词,删除标点符号。
但是,一个未跟随空格的点被视为
的一部分。除非
令牌中有数字,否则用连字符拆分单词。在这种情况下,整个令牌被解释为产品
数,并且不会被拆分。将电子邮件地址和Internet
主机名识别为一个令牌。

Splits words at punctuation characters, removing punctuations. However, a dot that's not followed by whitespace is considered part of a token. Splits words at hyphens, unless there's a number in the token. In that case, the whole token is interpreted as a product number and is not split. Recognizes email addresses and Internet hostnames as one token.

将此项用于您要在该字段上搜索的字段数据。

Would use this for fields where you want to search on the field data.

例如 -

http://example.com/I-am+example?Text=-Hello

将生成7个令牌(以逗号分隔) -

would generate 7 tokens (separated by comma) -

http,example.com,I,am,example,Text,Hello

KeywordTokenizerFactory: -

KeywordTokenizerFactory :-

Keyword Tokenizer根本不分割输入。

没有对字符串执行任何处理,整个字符串被视为单个实体。

这实际上并没有进行任何标记化。它将原始文本作为一个术语返回。

Keyword Tokenizer does not split the input at all.
No processing in performed on the string, and the whole string is treated as a single entity.
This doesn't actually do any tokenization. It returns the original text as one term.

主要用于排序或分面要求,在过滤多个单词时要匹配精确的构面,并排序,因为排序不会对标记化字段起作用。

Mainly used for sorting or faceting requirements, where you want to match the exact facet when filtering on multiple words and sorting as sorting does not work on tokenized fields.

例如。

http://example.com/I-am+example?Text=-Hello

会生成一个令牌 -

would generate a single token -

http://example.com/I-am+example?Text=-Hello

这篇关于Solr中StandardTokenizerFactory和KeywordTokenizerFactory之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆