弹性搜索 - 文档中碎片的位置 [英] Elasticsearch - location of fragments in a document

查看：107 发布时间：2017/8/7 2:45:32 elasticsearch

本文介绍了弹性搜索 - 文档中碎片的位置的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在执行如下所示的短语查询。它返回我的相关性排列的突出显示的片段。当然，我希望用户点击一个片段，我会将文档滚动到相应的位置。但是，我在弹性搜索中没有看到任何方法来找出原始文档中的碎片。任何想法？

I am executing a phrase query like the one below. It returns me the highlighted fragments ordered by relevance. Naturally, I want the user to click on a fragment and I'd scroll the document to the corresponding location. However, I don't see any way in Elasticsearch to find out where the fragments are in the original document. Any ideas?

GET documents/doc/_search
{
   "query": {
        "match_phrase": {
            "text": {
                "query": "hello world",
                "slop":  10
            }
        }
    }, 
    "highlight" : {
        "order" : "score",
        "fields" : {
            "text" : {"fragment_size" : 100, "number_of_fragments" : 10}
        }
    }
}

推荐答案

在此期间，我们找不到正确的解决方案，最终得到以下hack（对我们非常有效）：
在索引之前，我们用文本索引]，这样一些要索引的文本变成某些[00]文本[01]到[02]索引[03] 。然后我们使用如下所示的char过滤器。当我们返回亮点时，我们将从高亮文本中解析出单词位置。

In the meantime we couldn't find a proper solution and ended up with the following hack (works very well for us): Before indexing we annotate each word in the text with "[index]", so that "some text to index" becomes "some[00] text[01] to[02] index[03]". Then we use the char filter as shown below. When the highlights are returned we parse out the word positions from the highlight text.

"settings": {
    "analysis": {
      "char_filter": {
        "remove_annotation": {
          "type": "pattern_replace",
          "pattern": "\\[[0-9]+\\]",
          "replacement": ""
        }
      },
      "analyzer": {
        "annotated_english_language_analyzer": {
          "type": "custom",
          "char_filter": [
            "remove_annotation"
          ],
          ...

注意，注释索引应该被填充到 log10（text_length）+1 所以找到的亮点的宽度（删除注释后）不会取决于找到的位置（文本的开头和结尾）。

Note, the annotation indexes should be padded to log10(text_length)+1 digits, so that the width of the found highlights (after annotations removal) would not depend on where (beginning vs. end of the text) it was found.

这篇关于弹性搜索 - 文档中碎片的位置的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

弹性搜索 - 文档中碎片的位置 [英] Elasticsearch - location of fragments in a document

问题描述

推荐答案

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

弹性搜索 - 文档中碎片的位置 [英] Elasticsearch - location of fragments in a document

问题描述

推荐答案

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭