Solr 中 StandardTokenizerFactory 和 KeywordTokenizerFactory 的区别? [英] Difference between StandardTokenizerFactory and KeywordTokenizerFactory in Solr?

查看:21
本文介绍了Solr 中 StandardTokenizerFactory 和 KeywordTokenizerFactory 的区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Solr 的新手.我想知道什么时候使用 StandardTokenizerFactoryKeywordTokenizerFactory?

I am new to Solr.I want to know when to use StandardTokenizerFactory and KeywordTokenizerFactory?

我阅读了 Apache Wiki 上的文档,但我不明白.

I read the docs on Apache Wiki, but I am not getting it.

谁能解释一下StandardTokenizerFactory 和 KeywordTokenizerFactory 之间的区别?

推荐答案

StandardTokenizerFactory :-
它对空格进行标记,并去除字符

StandardTokenizerFactory :-
It tokenizes on whitespace, as well as strips characters

文档:-

在标点字符处拆分单词,删除标点符号.但是,后面没有空格的点被认为是一个令牌.在连字符处拆分单词,除非里面有数字令牌.在这种情况下,整个令牌被解释为一个产品号且不拆分.识别电子邮件地址和互联网主机名作为一个标记.

Splits words at punctuation characters, removing punctuations. However, a dot that's not followed by whitespace is considered part of a token. Splits words at hyphens, unless there's a number in the token. In that case, the whole token is interpreted as a product number and is not split. Recognizes email addresses and Internet hostnames as one token.

将用于要搜索字段数据的字段.

Would use this for fields where you want to search on the field data.

例如-

http://example.com/I-am+example?Text=-Hello

将生成 7 个标记(以逗号分隔)-

would generate 7 tokens (separated by comma) -

http,example.com,I,am,example,Text,Hello

KeywordTokenizerFactory :-

KeywordTokenizerFactory :-

Keyword Tokenizer 根本不拆分输入.
不对字符串进行任何处理,整个字符串被视为单个实体.
这实际上并没有进行任何标记化.它将原始文本作为一个术语返回.

Keyword Tokenizer does not split the input at all.
No processing in performed on the string, and the whole string is treated as a single entity.
This doesn't actually do any tokenization. It returns the original text as one term.

主要用于排序或分面需求,当过滤多个词时,您希望匹配准确的分面,排序对标记化字段不起作用.

Mainly used for sorting or faceting requirements, where you want to match the exact facet when filtering on multiple words and sorting as sorting does not work on tokenized fields.

例如

http://example.com/I-am+example?Text=-Hello

将生成单个令牌 -

http://example.com/I-am+example?Text=-Hello

这篇关于Solr 中 StandardTokenizerFactory 和 KeywordTokenizerFactory 的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆