如何告诉 Solr 返回每个文档的命中搜索词? [英] How can I tell Solr to return the hit search terms per document?

查看:34
本文介绍了如何告诉 Solr 返回每个文档的命中搜索词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于 Solr 查询的问题.当我使用多个搜索词执行查询时,这些搜索词都由 OR 逻辑链接(例如 q=content:(foo OR bar OR foobar)),Solr 会返回一个文档列表,所有这些搜索词都匹配条款.但是 Solr 返回的是哪些文档被哪些词条命中.所以在上面的例子中,我想知道的是我的结果列表中哪些文档包含术语 foo 等.根据这些信息,我将能够创建一个术语文档矩阵.

I have a question about queries in Solr. When I perform a query with multiple search terms that are all logically linked by OR (e.g. q=content:(foo OR bar OR foobar)) than Solr returns a list of documents that all matches any of these terms. But what Solr does not return is which documents were hit by which term(s). So in the example above, what I want to know is which documents in my result list contains the term foo etc. Given this information I would be able to create a term-document matrix.

所以我的问题是:我怎样才能告诉 Solr 给我那条缺失的信息?我确定它在某个地方,否则整个搜索将不起作用.但我错过了什么?感谢您的帮助.

So my question is: how can I tell Solr to give me that missing piece of information? I'm sure it is somewhere, otherwise the search as a whole would not work. But what am I missing? Thanks for your help.

PS:作为一种解决方法,我对所有搜索词执行单个 Solr 查询.但是正如您可以想象的那样,这在性能方面是一场灾难,因为搜索词的数量可能超过 50 :(

PS: As a workaround I'm performing a single Solr query for all the search terms. But as you can imagine it's a desaster in matters of performance as the number of search terms can exceed 50 :(

推荐答案

这取决于您的要求,但据我所知,Solr 中没有对此提供特定支持.但是,您可以通过其他几种方式将其组合在一起.不确定您对这些性能的期望,不过..

Kind of depends on your requirements, but as far as I know there is no specific support for this in Solr. You can however hack it together in a few other ways. Not sure what you can expect for performance for these, tho..

使用突出显示

如果您使用突出显示,您可以解析返回的突出显示片段以获取突出显示文本的开始/结束标记.这将是与您的查询中的某些内容相匹配的字词.

If you use highlighting you can parse the returned highlighted snippets for the start/end tags of the highlighted text. This will be the term that matched something in your query.

使用调试查询信息

您可以使用 debugQuery=true 解析查询返回的信息,通过查看 termWeight (iirc) 来确定某个术语与结果相关联.这可能是原始术语的过滤版本(如果您对该字段进行了词干提取等活动).

You can parse the information returned by a query with debugQuery=true to determine that a term was associated with a result by looking at termWeight (iirc). This might be a filtered version of your original term (if you have stemming etc. active for the field).

使用字段折叠

通过使用 group.query,您可以构建与每个术语匹配的文档列表,而不是发出多个请求.如果您需要包含任一"的列表,您还可以构建具有多个术语或组合在一起的查询.可能对大量字段无效.

By using group.query you can build lists of documents that matches each term, instead of issuing several requests. You can also build queries that feature several of the terms OR-ed together if you need lists for "contains either". Might not be effective for a large amount of fields.

自己解析返回的文档

获取文档,然后自己提取术语.将需要一些模糊匹配,因为您还必须在 Solr 端处理文本.

Get the document, then extract the terms by yourself. Will require a bit of fuzzy matching, since you'll have to deal with text processing on the Solr side as well.

使用函数查询

您可以从 FunctionQuery 中获取每个文档的元值,该函数查询该文档中某个词的出现次数.对于大量术语将需要相当多的函数查询,但可能会很快.

You can get metavalues for each document with each term from a FunctionQuery that looks up the number occurences of a term in that document. Will require quite a few function queries for a large number of terms, but might be fast.

.. 两个选项都不是完美的,但可能会解决手头的问题.

.. neither option is perfect, but might work for the problem at hand.

这篇关于如何告诉 Solr 返回每个文档的命中搜索词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆