Solr突出显示功能还可以指示返回的片段在原始字段中的位置或偏移吗? [英] Can Solr highlighting also indicate the position or offset of the returned fragments within the original field?

查看:76
本文介绍了Solr突出显示功能还可以指示返回的片段在原始字段中的位置或偏移吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景

使用Solr 4.0.0.我已经索引了一组示例文档的文本并启用了术语向量,因此我可以使用快速向量突出显示

Using Solr 4.0.0. I've indexed the text of a set of sample documents and enabled Term Vectors so I can use Fast Vector Highlighting

<field name="raw_text" type="text_en" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />

为了突出显示,我使用具有SENTENCE边界的Break Iterator边界扫描器.

For highlighting I'm using the Break Iterator Boundary Scanner with SENTENCE boundaries.

<boundaryScanner name="breakIterator" class="solr.highlight.BreakIteratorBoundaryScanner">
    <lst name="defaults">
      <!-- type should be one of CHARACTER, WORD(default), LINE and SENTENCE -->
      <str name="hl.bs.type">SENTENCE</str>
    </lst>
  </boundaryScanner>

我做了一个简单的查询

http://localhost:8983/solr/documents/select?q=raw_text%3AArtibonite&wt=xml&hl=true&hl.fl=raw_text&hl.useFastVectorHighlighter=true&hl.snippets=100&hl.boundaryScanner=breakIterator

突出显示效果很好

<response>
...
<result name="response" numFound="5" start="0">
<doc>
  <str name="id">-1071691270</str>
  <str name="raw_text">
     Final Report of the Independent Panel of Experts on the Cholera
     Outbreak in Haiti Dr. Alejando Cravioto (Chair) International
     Center for Diarrhoeal Disease Research, Dhaka, Bangladesh Dr.
     Claudio F. Lanata Instituto de Investigación Nutricional, and
     The US Navy Medical Research Unit 6, Lima, Peru Engr. Daniele
     S. Lantagne Harvard University... ~SNIP~
  </str>
<doc>
<lst name="highlighting">
  <lst name="-1071691270">
    <arr name="raw_text">
      ...
      <str>
        The timeline suggests that the outbreak spread along
        the <em>Artibonite</em> River. After establishing that
        the cases began in the upper reaches of the Artibonite
        River, potential sources of contamination that could have
        initiated the outbreak were investigated.
      </str>
      ...
    </arr>
  </lst>
</lst>

问题

我希望能够将结果句子发送以进行进一步处理(实体提取等),但是我想跟踪原始(长)文本字段中突出显示的句子的开始/结束偏移量. 有没有简单的方法可以做到这一点?

I want to be able to send the resulting sentences for further processing (entity-extraction, etc.) but I would like to track the start/end offsets of the highlighted sentence within the original (long) text field. Is there straightforward way to do this?

最好将hl.fragsize设置为返回整个字段,然后以这种方式处理/提取感兴趣的句子吗?

Would it be better to set hl.fragsize to return the entire field and then process/extract the sentences of interest this way?

推荐答案

除了进行某种自定义之外,没有办法返回带有突出显示结果的片段的偏移信息.

There is no way to return offset information of the fragments with the highlighting results aside from doing some sort of customization.

您似乎有一些选择:

1)您可以通过创建自定义格式化程序来扩展Solr荧光笔,该格式化程序将偏移量信息编码为字符串.每个术语传递给格式化程序的TokenGroup将具有偏移量和位置信息.如果格式化程序返回了<span data-offset=X>text</span>或类似内容,那将是一种方法.这似乎不是最直接的.

1) You can extend the Solr Highlighter by creating a custom Formatter that encodes the offset information into the string. The TokenGroup that is passed in to the Formatter for each term will have offset and position information stored in it. If your formatter returned a <span data-offset=X>text</span> or something similar, then that would be one way. This doesn't seem to be the most straightforward.

2)如您所说,请使用hl.fragsize=0返回整个字段.

2) As you said, return the entire field using hl.fragsize=0.

3)在其他请求中使用 TermVectorsComponent 并映射从中返回的偏移量/位置信息它与突出显示的片段.

3) Use the TermVectorsComponent in an additional request and map the offset/position information returned from it with the highlighted fragments.

无论如何,如果您要进行自己的碎片处理,那么最好的解决方案可能是在Solr中执行0碎片处理并自己处理.另外,您可以用Java实现自己的BoundaryScanner实现,以使用您自己的实体提取特殊知识来分解这些片段.

If you are doing your own fragmenting anyway, the best solution for you would probably be to either do 0 fragmenting in Solr and handle it all yourself. Alternatively, you could implement your own BoundaryScanner implementation in Java to use your own special knowledge of entity extraction in the breaking up of the fragments.

这篇关于Solr突出显示功能还可以指示返回的片段在原始字段中的位置或偏移吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆