ElasticSearch获得突出显示的片段的偏移量 [英] ElasticSearch get offsets of highlighted snippets

查看：777 发布时间：2017/8/6 23:15:55 elasticsearch

本文介绍了ElasticSearch获得突出显示的片段的偏移量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是否可以获取每个突出显示的片段的字符位置？我需要将突出显示的文本与源文档相匹配，并且具有字符位置将使其成为可能。

例如：

  curllocalhost：9200 / twitter / tweet / _search？pretty = true-d'{
query：{
query_string {
query：foo
} 
}，
highlight：{
fields：{
message number_of_fragments：20} 
} 
} 
}'

返回这个高光：

 highlight：{
message：[some&em; foo< ; / em> text] 
}

如果匹配文档中的字段消息：

 这是一些foo文本

有没有办法知道片段从char 8开始，以匹配字段的char 21结尾？

知道匹配的开始/结束偏移量令牌对我也是有好处的 - 也许有一种方法可以使用script_fields访问这些信息？（此问题显示了如何获取令牌，但不显示偏移量）。

字段message具有：

 term_vector with_positions_offsets，
index_options：位置

解决方案>

客户端方法实际上是标准做法。

我们已经讨论了添加偏移量，但是害怕会导致更多的混乱。提供的偏移量特定于Java的UTF-16字符串编码，虽然它们可以在技术上用于从$ LANG计算片段，但是更直接的解析您指定的分隔符的响应文本。

Is it possible to get character positions of each highlighted fragment? I need to match the highlighted text back to the source document and having character positions would make it possible.

For example:

curl "localhost:9200/twitter/tweet/_search?pretty=true" -d '{
    "query": {
        "query_string": {
            "query": "foo"
        }
    },
    "highlight": {
        "fields": {
            "message": {"number_of_fragments": 20}
        }
    }    
}'

returns this highglight:

"highlight" : {
    "message" : [ "some <em>foo</em> text" ]
 }

If the field message in the matched document were:

"Here is some foo text"

is there a way to know that the snippet begins at char 8 and ends at char 21 of the matched field?

Knowing the start/end offset of the matched token would be good for me as well - perhaps there is a way to access that information using script_fields? (This question shows how to obtain the tokens, but not the offsets).

The field "message" has:

"term_vector" : "with_positions_offsets",
"index_options" : "positions"

解决方案

The client-side approach is actually standard practice.

We have discussed adding the offsets, but are afraid it would lead to more confusion. The offsets provided are specific to Java's UTF-16 String encoding, which, while they could technically be used to calculate the fragments from $LANG, it's way more straightforward to parse the response text for the delimiters you specified.

这篇关于ElasticSearch获得突出显示的片段的偏移量的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

ElasticSearch获得突出显示的片段的偏移量 [英] ElasticSearch get offsets of highlighted snippets

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

ElasticSearch获得突出显示的片段的偏移量 [英] ElasticSearch get offsets of highlighted snippets

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭