获取与Solr / Lucene中匹配内容关联的元数据 [英] Obtain metadata associated with matched content in Solr/Lucene

查看:127
本文介绍了获取与Solr / Lucene中匹配内容关联的元数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一大堆文本文档,我将使用Solr进行索引,其格式是每行文本都有关联的元数据。例如:

I've a large set of text documents which I will index with Solr, in a format where each line of text has associated metadata. For example:

#metadata1
A line of text.
#metadata2
Another long, broken line of
#metadata3
text that should be searchable.

我想对此进行索引以便内容可搜索,包括跨越多行的词组匹配,但不是元数据。但是,我不能丢弃元数据:我希望任何匹配仍然具有相关的元数据。

I'd like to index this such that the content is searchable, including phrase matches spanning multiple lines, but not the metadata. However, I can't discard the metadata: I would like to have any matches still have the associated metadata.

例如。对文本行的查询将返回2个匹配,一个是第一行(及其关联的元数据metadata1),另一个是第二行和第三行(分别具有关联的metadata1和metadata2)。

E.g. A query for "line of text" would return 2 matches, one being the first line (and its associated metadata "metadata1") and the other being the second and third lines (with the associated "metadata1" and "metadata2" respectively).

有谁可以描述如何做到这一点,或者参考一个可以让我开始的教程?

Can anyone describe how this might be done, or reference a tutorial that would get me started?

推荐答案

由于Solr在封面下使用Lucene,您应该从Lucene文档模型开始:

Since Solr uses Lucene under the cover, you should start with the Lucene document model:


  • index is文档集合

  • 文档是一系列字段。

  • 字段是一个命名的术语序列。

  • 术语是一个字符串。

  • index is a collection of documents
  • A document is a sequence of fields.
  • A field is a named sequence of terms.
  • A term is a string.

搜索遍历一个或多个字段并返回文档作为结果。因此,如果您希望跨多行进行跨度查询,则必须将它们放入一个文档中,但文本行查询将只匹配一个文档。

Searching goes over one or more fields and returns documents as results. Therefore, if you want to have span queries over multiple lines, you will have to put them into one document, but then the "line of text" query will match only one document.

更新:似乎可以使用 FieldMaskingSpanQuery

如果您不想搜索可行的元数据行(你根本不会索引他们)。还要将元数据包含到结果中(我猜你想在搜索时索引和检索时存储它)。

If you don't want to search over metadata lines that's doable (you simply won't index them). Also to include metadata into results (I guess you want to store this while indexing and retrieve at search time).

这篇关于获取与Solr / Lucene中匹配内容关联的元数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆