太阳黑子(Sunspot)-加强记录文本中较早发生匹配的位置 [英] Sunspot -- Boost records where matches occur early in the text

查看:116
本文介绍了太阳黑子(Sunspot)-加强记录文本中较早发生匹配的位置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如,假设我的数据库中有一条记录,该记录的文本为"Hormel Corporation",而搜索词类似于"Hormel Corned Beef 16 Ounces".按照我当前的配置,即使"Hormel Corporation"是我要查找的内容,但最高的结果将是其他记录.我认为解决我的问题的方法是优先考虑在搜索词中最早出现匹配项的记录.我已经阅读了所有文档,但是在弄清楚这可能如何工作时遇到了麻烦.

For example, let's say there is a record in my DB that has the text "Hormel Corporation" and my search term is something like "Hormel Corned Beef 16 Ounces". As my current configuration stands, the top results will be other records, even though "Hormel Corporation" is the one I'm looking for. I think the solution to my problem would be to give priority to records where a match comes earliest in the search term. I've read all the docs, but I have had trouble figuring out how this might work.

我只有一个字段-名称.我想要的记录的名称字段显示为"Hormel Corporation",但是当我搜索"Hormel Corned Beef 16 Ounces"时,最高结果是ISNT"Hormel Corporation",但似乎是随机的,而我在记录中在结果中排​​名第三或第四.

I only have one field -- name. That name field for the record I want reads "Hormel Corporation", however when I search the "Hormel Corned Beef 16 Ounces", the top result is something that ISNT "Hormel Corporation," but something seemingly random, while the record I'm looking for is 3rd or 4th in the results.

非常感谢!

推荐答案

我有一个类似的问题要解决.因此,我将数据存储在许多字段中:

I had a similar problem to solve. So I stored my data in many fields:

title
keywords (upto 10 words)
abstract (a paragraph)
text (as long as you like)

对于查询,我在具有不同权重的字段上使用了dismax查询解析器:

For querying, I used the dismax query parser over the fields with different weights:

title^20
keywords^20
abstract^12
text^1


所以,如果您


So if you

  1. 很好地定义您的数据架构
  2. 使用dismax
  3. 为您的查询确定每个字段的权重

当您搜索"Hormel咸牛肉16盎司"时,标题为"Hormel Corp"的结果会为文档中包含"......"的文档打分更好,我们推荐一罐Hormel咸牛肉16盎司. .."

when you search "Hormel Corned Beef 16 Ounces", a result whose title is "Hormel Corp" will score better a document whose body contains "...For the dish, we reccomend a can of Hormel Corned Beef 16 Ounces..."

编辑OP的评论.

OP的事实是:给定n个单词的标题,前n个单词比其余单词重要.

OP's fact is: given a title of n words, the first n words matter more than the rest.

我建议一个数据模型,其中有两个字段:title_first_wordstitle.客户端应用程序(对不起,您不能直接使用DIH)将必须从标题中提取前n个单词以存储到title_first_words中,而完整的标题将存储到title中.

I suggest a data model in which there are two fields: title_first_words and title. The client application (sorry, you cannot directly use DIH) will have to extract the first n words from title to store into title_first_words and the full title is stored to title.

对于搜索,您可以将整个查询提供给dismax解析器.查询解析器像title_first_words^4 title^1一样偏向title_first_words.因此,前n个字词将对给定的搜索产生更大的影响.

For searching, you can give the entire query to the dismax parser. The query parser is theb biased to title_first_words like title_first_words^4 title^1. Thus the first n words will make a bigger impact for a given search.

这篇关于太阳黑子(Sunspot)-加强记录文本中较早发生匹配的位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆