用SOLR搜索短词 [英] Search for short words with SOLR

查看:91
本文介绍了用SOLR搜索短词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用SOLR和NGramTokenizerFactory来帮助创建单词子串的搜索令牌

I am using SOLR along with NGramTokenizerFactory to help create search tokens for substrings of words

NGramTokenizer配置的最小字长为3

NGramTokenizer is configured with a minimum word length of 3

这意味着我可以搜索例如"unb",然后匹配令人难以置信".

This means that I can search for e.g. "unb" and then match the word "unbelievable".

但是我对诸如"I"和"in"之类的简短单词有疑问.这些没有被SOLR索引(我怀疑是由于NGramTokenizer所致),因此我无法搜索它们.

However I have a problem with short words like "I" and "in". These are not indexed by SOLR (I suspect it is because of NGramTokenizer) and therefore I cannot search for them.

我不想将最小字长减少为1或2,因为这会产生巨大的搜索索引.但我希望SOLR包括长度已低于此最小值的整个单词.

I don't want to reduce the minimum word length to 1 or 2, since this creates a huge search index. But I would like SOLR to include whole words whose length is already below this minimum.

我该怎么做?

/卡斯滕

推荐答案

首先,尝试了解为什么使用分析工具"无法通过solr为您的单词建立索引

First of all, try to understand why your words don't get indexed by solr using the "Analysis Tool"

http://localhost:8080/solr/admin/analysis.jsp

只需输入您要搜索的字段和文本,然后查看哪个分析器正在过滤您的短期交易.我建议您这样做,因为您说自己只有可疑",并且必须确定哪个分析器可以过滤您的数据.

Just put the field and the text you are searching for and see which analyser is filtering your short term. I suggest you to do so because you said you have only a "suspect" and you have to be certain about which analyser filters your data.

那么,为什么不使用该分析器就直接将其复制到另一个字段中呢?

Then why don't you just simply copy the term in another field without that analyser?

通过这种方式,您的术语将被索引两次,并且将同时显示为准确的单词和n-gram. 然后,您必须处理两个不同字段的分数.

In this way your terms will be indexed twice, and will appear both as exact word and as n-gram. Then you have to deal with the scores of the two different fields.

我希望这对您有所帮助.

I hope this has helped you in some way.

一些用于聚合和复制字段属性的链接:

Some link for aggregation and copyfield attribute:

为多个字段中的数据建立索引

使用复制字段标记

这篇关于用SOLR搜索短词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆