在Elasticsearch中返回位置并突出显示搜索查询 [英] Return position and highlighting of search queries in Elasticsearch

查看:79
本文介绍了在Elasticsearch中返回位置并突出显示搜索查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用安装在个人Debian服务器上的官方Elasticsearch-PHP客户端,而我试图做的工作包括索引,搜索和突出显示单个文档.也就是说,每个搜索结果将仅返回一个文档-然后将其突出显示为简单查询字符串"搜索.我也在使用FVH(快速矢量突出显示).

I am using the official Elasticsearch-PHP client installed on a personal Debian server, and what I am trying to do involves indexing, searching and highlighting individual documents. i.e. each search result will only return one document - which will then be highlighted for "simple query string" searches. I am also using FVH (fast vector highlighting).

我的问题与此位置不变,而不是突出显示且测试代码基本相同,因此在此不再赘述.但是,就我而言,我需要同时同时定位和突出显示.我点击了有关术语向量的文档的链接,但是就像其他OP一样,我的搜索不是确切的词本身.在某些情况下,它们是短语.我将如何处理?

My question is similar to this one Position as result, instead of highlighting and the test code is basically the same so I won't repeat that here. However in my case I need both position and highlighting. I followed the link to the documentation about term vectors, but just like the other OP, my searches are not exact words per se. In some cases they are phrases. How would I approach this?

我的用例是仅搜索一个文档(针对每个查询),并提供带有链接的结果摘要,用户可以单击链接以转到结果在文档中的特定位置.如果我有索引/职位,我可以简单地将其用于文档的全部来源.我检查了文档无济于事.

My use case is to search only one document (for each query), and present a summary of results with links which the user can click to go to the specific place in the document where that result came from. If I have the index / position I can simply use that against the full source of the document. I have checked the documentation to no avail.

推荐答案

您可以尝试安装由 wikimedia基金会开发的特定插件,称为 Experimental Highlighter -github 此处

You could try to install a specific plugin developed by wikimedia foundation called Experimental Highlighter -github here

您可以通过这种方式为Elasticsearch 7.5安装-有关其他Elasticsearch版本,请参阅github项目页面:

You can install for elasticsearch 7.5 in this way - for other elasticsearch versions please refer to the github project page:

./bin/elasticsearch-plugin install org.wikimedia.search.highlighter:experimental-highlighter-elasticsearch-plugin:7.5.1

然后重新启动elasticsearch.

And restart elasticsearch.

因为您还需要检索 positions -如果在您的用例中,偏移量可以代替位置,请转到下一段-您应使用带有索引选项<的termvector声明字段代码>"with_position_offset_payloads" -doc 这里

Inasmuch you need to retrieve also the positions - if for your use case the offsets can replace the positions please go on to the next paragraph - you should declare your field with termvector with the index option "with_position_offset_payloads" - doc here

PUT /my-index-000001
{ "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "term_vector": "with_positions_offsets_payloads",
        "analyzer" : "fulltext_analyzer"
       }
     }
   }
}

对于不需要检索位置的其他情况,它更快并且使用更少的空间来使用索引选项"offsets" -弹性文档

For other cases that don't need to retrieve also the position, it is faster and uses much less space to use the index option "offsets" - elastic doc here, plugin doc here:

PUT /my-index-000001
{ "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "index_options": "offsets",
        "analyzer" : "fulltext_analyzer"
       }
     }
   }
}

然后您可以使用实验性荧光笔进行查询并仅返回荧光笔部分的偏移量:

Then you could query with the experimental highlighter and return only offset of the highlighter part:

{
  "query": {
    "match": {
      "text": "hello world"
    }
  },
  "highlight": {
    "order": "score",
    "fields": {
      "text": {
        "number_of_fragments": 10,
        "fragment_size": 15,
        "type": "experimental",
        "options": {"return_offset": true}
      }
    }
  }
}

通过这种方式,您的查询不会返回任何文本,而只会返回开始偏移结束偏移-代表位置的数字.要检索突出显示的内容,您需要在 ['hits'] ['hits'] [0] ['_ source'] ['text'] 中输入-text是您的字段名-并提取文本从字段开始,使用您的起始偏移点和结束偏移点.您需要确保使用正确的字符串编码- UTF-8 -否则偏移量与文本不匹配.根据文档:

In this way no text is returned from your query but only the start offset and the end offset - numbers that represent position. To retrieve your highlighted content you need to enter inside ['hits']['hits'][0]['_source']['text'] -text is your field name - and extract text from the field using your start offset point and the end offset point. You need to ensure to use the correct string encoding - UTF-8 - otherwise the offsets don't match text. According to the doc:

return_offsets选项将突出显示的结果更改为字符串到高亮显示的偏移量突出显示.如果您需要进行客户端理智操作,这将非常有用检查突出显示.而不是标记的代码段,您将得到类似0:0-5,18-22:22的结果.外部数字是开始和代码段的结束偏移量.以,s分隔的数字对是热门.-之前的数字是起始偏移量,而-后面的数字是结束偏移量.多值字段具有一个它们之间具有单个字符的偏移量.

The return_offsets option changes the results from a highlighted string to the offsets in the highlighted that would have been highlighted. This is useful if you need to do client side sanity checking on the highlighting. Instead of a marked up snippet you'll get a result like 0:0-5,18-22:22. The outer numbers are the start and end offset of the snippet. The pairs of numbers separated by the ,s are the hits. The number before the - is the start offset and the number after the - is the end offset. Multi-valued fields have a single character worth of offset between them.

让我知道该插件是否可以帮助您!

Let me know if that plugin could help!

这篇关于在Elasticsearch中返回位置并突出显示搜索查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆