使用带有Solr多面搜索的词干字段显示人类可读的最常见索引词 [英] Showing human readable most frequent indexed terms using a stemmed field with Solr faceted search

查看:100
本文介绍了使用带有Solr多面搜索的词干字段显示人类可读的最常见索引词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正计划使用Solr向用户显示字段中"n"个最常用的术语,并且我们希望应用词干分析法,以便对相似的术语进行分组.

We are planning on using Solr to show the users the "n" most frequent terms from a field and we want to apply stemming so that similar terms get grouped.

现在,我们需要向用户展示这些术语,但是词干并不总是人类可读的.有什么方法可以举例说明原始词干,以便可以将其显示给用户?

Now, we need to show the terms to the users but the stemmed terms are not always human readable. Is there any way to get an example of the original terms that got stemmed so that those could be shown to the user?

我们能想到的唯一解决方案是查询两个不同的字段,一个有茎,一个没有茎,然后自己进行匹配.但是我们认为这样做会很昂贵(两次查询),并且容易出错(匹配可能会产生错误).

The only solution we can think of is quering two different fields, one with stemming and one without and then do the matching ourselves. But we think that is going to be expensive (two queries) and may be error prone (the matching may produce errors).

还有其他方法可以在Solr上实现吗?预先感谢.

Is there any other way to implement this on Solr? Thanks in advance.

推荐答案

在查询时间索引时间都应用了词干分析,因此我认为没有简单的方法可以完成您的任务重新尝试做.但是,根据数据库中结果的数量,可能可以通过使用构面和突出显示的组合来执行此操作.突出显示的术语将是整个匹配术语,而不是词干的术语(例如,词干的术语可能是关联",但是突出显示的术语将是关联",关联",关联"等).也许您可以执行以下操作:

Stemming is applied at both query time and index time so I don't think there is an easy way to accomplish what you're trying to do. However, it may be possible, depending on the number of results in your database, to do this by employing a combination of faceting and highlighting. The highlighted term will be the entire matching term rather than the stemmed term (so, for example, the stemmed term might be "associ" but the highlighted terms will be "associated", "association", "associations", etc.). Perhaps what you could do is the following:

?q=keyword&facet=true&facet.field=myfield&&facet.limit=20hl=true&hl.fl=myfield&hl.fragsize=0&rows=10

获取10行并检查突出显示的结果(默认情况下,使用<em> </em>标签将其突出显示,但是您可以使用hl.simple.prehl.simple.post进行更改-例如,使用&hl.simple.pre=[&hl.simple.post=]将匹配项括在方括号中)至少应提供原始"匹配项的样本. hl.fragsize=0返回整个字段以及突出显示.

Getting 10 rows and examining the highlighted results (by default, these are highlighted using <em> </em> tags but you can change this by using hl.simple.pre and hl.simple.post -- for example, using &hl.simple.pre=[&hl.simple.post=] would wrap the matching terms in square brackets) should at least give a sample of the "original" matching terms. hl.fragsize=0 returns the entire field along with highlighting.

希望这会有所帮助.您可以在此处阅读有关突出显示参数的更多信息: http://wiki.apache.org/solr/HighlightingParameters

Hope this helps. You can read more about highlighting parameters here: http://wiki.apache.org/solr/HighlightingParameters

这篇关于使用带有Solr多面搜索的词干字段显示人类可读的最常见索引词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆