Lucene中的关键字(OR,AND)搜索 [英] Keyword (OR, AND) search in Lucene

查看:231
本文介绍了Lucene中的关键字(OR,AND)搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在门户网站(基于J2EE)中使用Lucene进行索引和搜索服务.

I am using Lucene in my portal (J2EE based) for indexing and search services.

问题与Lucene的关键字有关.在搜索查询中使用其中之一时,会出现错误.

The problem is about the keywords of Lucene. When you use one of them in the search query, you'll get an error.

例如:

searchTerms = "ik OR jij"

这很好用,因为它将搜索"ik""jij"

This works fine, because it will search for "ik" or "jij"

searchTerms = "ik AND jij"

这很好用,它搜索"ik""jij"

但是当您搜索时:

searchTerms = "OR"
searchTerms = "AND"
searchTerms = "ik OR"
searchTerms = "OR ik"

等等,它将失败并显示错误:

Etc., it will fail with an error:


Component Name: STSE_RESULTS  Class: org.apache.lucene.queryParser.ParseException  Message: Cannot parse 'OR jij': Encountered "OR" at line 1, column 0. 
Was expecting one of: 
... 

这是有道理的,因为这些单词是Lucene的关键字,可能会保留下来并用作关键字.

It makes sense, because these words are keywords for Lucene are probably reserved and will act as keywords.

在荷兰语中,"OR"一词很重要,因为它的意思是"Ondernemings Raad".许多文本中都使用了它,需要找到它.例如,或"确实有效,但不返回与术语或"匹配的文本.如何使其可搜索?

In Dutch, the word "OR" is important because it has a meaning for "Ondernemings Raad". It is used in many texts, and it needs to be found. For example "or" does work, but does not return texts matching the term "OR". How can I make it searchable?

如何转义关键字或"?或如何告诉Lucene将或"作为搜索词而不是关键字.

How can I escape the keyword "or"? Or How can I tell Lucene to treat "or" as a search term NOT as a keyword.

推荐答案

我想您已经尝试将"OR"放在双引号中?

I suppose you have tried putting the "OR" into double quotes?

如果这不起作用,我认为您可能不得不改变Lucene的源代码,然后重新编译整个内容,因为运算符"OR"埋在了代码的深处.实际上,编译甚至还不够:您必须在用作JavaCC输入的源包中更改文件QueryParser.jj,然后运行JavaCC,然后重新编译整个程序.

If that doesn't work I think you might have to go so far as to change the Lucene source and then recompile the whole thing, as the operator "OR" is buried deep inside the code. Actually, compiling probably isn't even enough: you'll have to change the file QueryParser.jj in the source package that serves as input for JavaCC, then run JavaCC, then recompile the whole thing.

但是,好消息是,只有一行可以更改:

The good news, however, is that there's only one line to change:

| <OR: ("OR" | "||") >

成为

| <OR: ("||") >

这样,您将只有"||"作为逻辑或运算符.有一个build.xml也包含JavaCC的调用,但是您必须下载该工具你自己.恐怕我现在不能自己尝试.

That way, you'll have only "||" as logical OR operator. There is a build.xml that also contains the invocation of JavaCC, but you have to download that tool yourself. I can't try it myself right now, I'm afraid.

对于Lucene开发人员邮件列表来说,这也许是个好问题,但是如果您这样做,请告诉我们,他们会提供更简单的解决方案;-)

This is perhaps a good question for the Lucene developer mailing list, but please let us know if you do that and they come up with a simpler solution ;-)

这篇关于Lucene中的关键字(OR,AND)搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆