是否有快速,准确的Lucene荧光笔? [英] Is there a fast, accurate Highlighter for Lucene?

查看:117
本文介绍了是否有快速,准确的Lucene荧光笔?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用(Java) Lucene的荧光笔(在Sandbox包中)已有一段时间了。但是,在匹配搜索结果中的正确术语时,这并不是非常准确 - 它适用于简单查询,例如,搜索两个单独的单词将突出显示结果中的两个代码片段。

I've been using the (Java) Highlighter for Lucene (in the Sandbox package) for some time. However, this isn't really very accurate when it comes to matching the correct terms in search results - it works well for simple queries, for example searching for two separate words will highlight both code fragments in the results.

但是,对于更复杂的查询,它不能很好地运行。在最简单的情况下,诸如Stack Overflow之类的短语查询将匹配突出显示中所有出现的Stack或Overflow,这给用户留下了不能很好地工作的印象。

However, it doesn't act well with more complicated queries. In the simplest case, phrase queries such as "Stack Overflow" will match all occurrences of Stack or Overflow in the highlighting, which gives the impression to the user that it isn't working very well.

我尝试应用修复此处但这带来了许多性能警告,并且在一天结束时只是普遍无法使用。性能尤其是通配符查询的问题。这是由于突出显示的工作方式;它不是仅仅处理查询字符串和文本,而是像Lucene那样解析它,然后查找Lucene所做的所有匹配;不幸的是,这意味着对于某些通配符查询,它可以在大型文档上查找2000+子句的匹配,并且它的速度不够快。

I tried applying the fix here but that came with a lot of performance caveats, and at the end of the day was just plain unusable. The performance is especially an issue on wildcard queries. This is due to the way that the highlighting works; instead of just working on the querystring and the text it parses it as Lucene would and then looks for all the matches that Lucene has made; unfortunately this means that for certain wildcard queries it can be looking for matches to 2000+ clauses on large documents, and it's simply not fast enough.

是否有更快的实现一个准确的荧光笔?

Is there any faster implementation of an accurate highlighter?

推荐答案

有一个新的更快的荧光笔(需要修补但是将成为2.9版的一部分)

There is a new faster highlighter (needs to be patched in but will be part of release 2.9)

https:// issues。 apache.org/jira/browse/LUCENE-1522

反向引用此问题

这篇关于是否有快速,准确的Lucene荧光笔?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆