如何处理SOLR中高亮片段中的html标签 [英] How to handle html tags in highlight fragment in SOLR

查看：158 发布时间：2020/5/4 7:52:25 java search solr lucene

本文介绍了如何处理SOLR中高亮片段中的html标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用SOLR命中突出显示功能来设置与查询匹配的文档中的突出显示.

I use SOLR hit highlighting feature to set highlights in a document matches the query.

问题是其中一个字段包含有效的HTML，但是返回的突出显示片段不是有效的HTML，这就是为什么在渲染整个页面布局后出现的问题.

The problem is one of the fields contain valid HTML, but highlight fragment returned is not valid HTML, that's why after the rendering whole page layout is broken.

例如查询field:lucene，请给我这份文件:

For example query field:lucene get me this document:

<a href="/some/link">Here is the discussion, what the difference between SOLR, Elasticsearch and Lucene</a>

突出显示的片段是Elasticsearch and Lucene</a>.

我尝试设置片段大小= 0(返回整个字段内容)的选项之一，但是它可能非常大，结果页面只需要几段代码即可.

One of the option I've tried to set fragment size = 0 (return whole field content) but it can be very large and I need just a few snippets for the result page.

另一个选择是删除所有HTML标记并以纯文本显示代码段，但是我需要标记来突出显示.另外，某些标签可能会像</p那样被打断，这意味着我们不能为此目的使用html解析器.

Another option is to remove all HTML tags and show snippet in plain text, but I need  tags for highlighting. Also some tags could be broken in fragment like a </p that means we can't use html parsers for that purpose.

这似乎是搜索中的常见问题，是否有一些最先进的方法来处理呢?

It seems like a common problem in search, is there some state-of-the-art approach to handle that?

如何处理SOLR中高亮片段中的html标签 [英] How to handle html tags in highlight fragment in SOLR

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何处理SOLR中高亮片段中的html标签 [英] How to handle html tags in highlight fragment in SOLR

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭