Solr MoreLikeThis提升查询字段 [英] Solr MoreLikeThis boosting query fields

查看:90
本文介绍了Solr MoreLikeThis提升查询字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试Solr的MoreLikeThis功能.

I am experimenting with Solr's MoreLikeThis feature.

我的模式处理文章,并且我正在三个领域内寻找文章之间的相似之处: 文章标题,文章文字和主题.

My schema deals with articles, and I'm looking for similarities between articles within three fields: articletitle, articletext and topic.

以下查询效果很好:

q=id:(2e2ec74c-7c26-49c9-b359-31a11ea50453)
&rows=100000000&mlt=true
&mlt.fl=articletext,articletitle,topic&mlt.boost=true&mlt.mindf=1&mlt.mintf=1

但是我想尝试增加不同的查询字段-例如,增加文章标题的相似性.

But I would like to experiment with boosting different query fields - i.e. putting more weight on similarities in the articletitle, for instance.

文档( http://wiki.apache.org/solr/MoreLikeThis )建议可以通过包含mlt.qf属性并进行一些改进来实现.

The documentation (http://wiki.apache.org/solr/MoreLikeThis) suggests that this can be achieved by including the mlt.qf property, with some boosting.

我对这种查询的尝试如下:

My attempt at such a query is as follows:

q=id:(2e2ec74c-7c26-49c9-b359-31a11ea50453)&rows=100000000&mlt=true
&mlt.fl=articletext,articletitle,topic&mlt.boost=true
&mlt.mindf=1&mlt.mintf=1
&mlt.qf=articletext^0.1 articletitle^100 topic^0.1

但是,提升似乎没有影响-不管我提供什么提升,建议都保持不变(除了上面的查询,我强烈推荐标题的相似性,但这似乎没有发生)

However, the boosts seem to have no affect - no matter what boosts I supply, the recommendations remain the same (I would except the above query to heavily favour similarities in the titles, but this doesn't seem to be happening)

在这样使用MoreLikeThis的文档中找不到任何示例,这使我相信自己出了点问题.

I can't find any examples in the documentation that use MoreLikeThis in this way, which leads me to believe I've got something wrong.

有人能做到这样吗?

推荐答案

如果您具有简单的推荐要求(其中只有一个字段要匹配,或者几个字段具有同等重要性),则MLT组件很有用.但是,每当您想改变不同字段的相对重要性,或者需要做一些更具体的操作(例如,增加反向距离)时,您可能都想编写自己的伪MLT处理程序. MLT处理程序所做的全部工作就是根据源文档中tf.idf得分从指定的字段中生成最高术语.您可以在生成自定义SOLR OR查询的某些代码中轻松模拟该功能.您将失去术语向量的优势,但是只要您的查询大小合理(例如小于20个术语),它的性能就可能会很好.我们的索引很小,因此可以使用数百个字词生成我们自己的MLT查询,并且它会在可接受的时间(几毫秒)内执行.但是,我看到这种现象在具有几亿个文档和更大字段的大型索引上有所恶化,在这种情况下,您需要将查询限制在少数几个热门术语上.使用您自己的代码代替MLT可以完成更多工作,但是您可以获得更多的灵活性.

The MLT component is useful if you have simple recommendation requirements where you have only one field to match on, or several of equal importance. But any time you want to vary the relative importances of the different fields, or need to do something more specific like include an inverse distance boost, then you will probably want to write your own pseudo MLT handler. All the MLT handler does is to generate the top terms from the fields specified based on their tf.idf scores from the source document. You can easily emulate that functionality in some code that generates a custom SOLR OR query. You will lose the advantage of the termvectors, but so long as your queries are reasonably sized (say < 20 terms) it will probably perform pretty well. We have a small index and so generate our own MLT queries with several hundred terms and it executes in an acceptable amount of time (a few ms). However, I have seen this behavior deteriorate somewhat on large indexes with a few 100 million documents and larger fields, and in those cases you need to restrict your query to a small number of top terms. Using your own code in place of MLT is more work, but you gain a lot more in flexibility.

这篇关于Solr MoreLikeThis提升查询字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆