使用 SOLR 搜索短词 [英] Search for short words with SOLR

查看:19
本文介绍了使用 SOLR 搜索短词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 SOLR 和 NGramTokenizerFactory 来帮助为单词子串创建搜索标记

I am using SOLR along with NGramTokenizerFactory to help create search tokens for substrings of words

NGramTokenizer 配置的最小字长为 3

NGramTokenizer is configured with a minimum word length of 3

这意味着我可以搜索例如"unb" 然后匹配单词 "unbelievable".

This means that I can search for e.g. "unb" and then match the word "unbelievable".

但是,我对像I"和in"这样的短词有疑问.这些没有被 SOLR 索引(我怀疑是因为 NGramTokenizer),因此我无法搜索它们.

However I have a problem with short words like "I" and "in". These are not indexed by SOLR (I suspect it is because of NGramTokenizer) and therefore I cannot search for them.

我不想将最小字长减少到 1 或 2,因为这会创建一个巨大的搜索索引.但我希望 SOLR 包含长度已经低于此最小值的整个单词.

I don't want to reduce the minimum word length to 1 or 2, since this creates a huge search index. But I would like SOLR to include whole words whose length is already below this minimum.

我该怎么做?

/卡斯滕

推荐答案

首先,试着理解为什么你的词没有被 solr 使用分析工具"索引

First of all, try to understand why your words don't get indexed by solr using the "Analysis Tool"

http://localhost:8080/solr/admin/analysis.jsp

只需输入您要搜索的字段和文本,然后查看哪个分析器正在过滤您的短期内容.我建议你这样做,因为你说你只有一个嫌疑人",你必须确定哪个分析器过滤了你的数据.

Just put the field and the text you are searching for and see which analyser is filtering your short term. I suggest you to do so because you said you have only a "suspect" and you have to be certain about which analyser filters your data.

那你为什么不直接在没有分析器的情况下将术语复制到另一个领域呢?

Then why don't you just simply copy the term in another field without that analyser?

通过这种方式,您的术语将被编入两次索引,并且将同时显示为精确词和 n-gram.然后你要处理两个不同领域的分数.

In this way your terms will be indexed twice, and will appear both as exact word and as n-gram. Then you have to deal with the scores of the two different fields.

我希望这对您有所帮助.

I hope this has helped you in some way.

聚合和复制字段属性的一些链接:

Some link for aggregation and copyfield attribute:

在多个字段中索引数据

使用复制字段标签

这篇关于使用 SOLR 搜索短词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆