在lucene中使用点击荧光笔 [英] using hit highlighter in lucene

查看：107 发布时间：2018/12/4 12:26:37 java lucene hit-highlighting

本文介绍了在lucene中使用点击荧光笔的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

关于使用apache lucene提供的点击荧光笔我有两个问题：

I have two questions regarding hit highlighter provided with apache lucene:

参见这个函数
你能解释一下令牌流参数的使用吗。

see this function could you explain the use of token stream parameter.

我有几个大的lucene文档包含许多字段，每个字段中都包含一些字符串。现在我找到了特定查询最相关的文档。现在发现此文档是因为查询中的多个单词可能与文档中的单词匹配。我想找出查询中的哪些单词导致了这个问题。因此，我计划使用Lucene Hit Highlighter。
示例：如果查询是皮肤医生德里，而标题为皮肤科医生的文件中包含皮肤和医生字样，那么在点击突出显示后我应该能够分离出皮肤和医生来自查询。我一直试图为此编写代码数周。无法得到我想要的东西。你可以帮帮我吗？

I have several large lucene document containing many fields and each field has some strings in it. Now I have found the most relevant document for a particular query. Now this document was found because several words in the query might have matched with the words in the document. I want to find out what words in the query caused this. So for this I plan to use Lucene Hit Highlighter. Example: if the query is "skin doctor delhi" and the document titled "dermatologist" contains the words "skin" and "doctor" then after hit highlighting i should be able to separate out "skin" and "doctor" from the query. I have been trying to write the code for this for several weeks now. Not able to get what i want. Could you help me please?

提前致谢。

更新：

当前方法：
我创建一个包含文档中所有单词的查询。

Current Approach: I create a query containing all the words in the document.

Field[] field = doc.getFields("description");
String desc = "";
for (int j = 0; j < field.length; ++j) {
     desc += field[j].stringValue() + " ";
}

Query q = qp.parse(desc);
QueryScorer scorer = new QueryScorer(q, reader, "description");
Highlighter highlighter = new Highlighter(scorer);

String fragment = highlighter.getBestFragment(analyzer, "description", text);

它适用于小型文档，但不适用于大型文档。获得以下堆栈跟踪。

It works for small documents but does not work for large documents. The following stacktrace is obtained.

    org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 1024
    at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:152)
    at org.apache.lucene.queryParser.QueryParser.getBooleanQuery(QueryParser.java:891)
    at org.apache.lucene.queryParser.QueryParser.getBooleanQuery(QueryParser.java:866)
    at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1213)
    at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1167)
    at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:182)

很明显，这种方法对于大型文档来说是不合理的。应该怎么做才能纠正这个问题？

It is obvious that the approach is unreasonable for large documents. What should be done to correct this?

BTW我正在使用FuzzyQuery匹配。

BTW I am using FuzzyQuery matching.

在lucene中使用点击荧光笔 [英] using hit highlighter in lucene

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

在lucene中使用点击荧光笔 [英] using hit highlighter in lucene

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭