Lucene的RangeQuery不能适当地进行过滤 [英] Lucene RangeQuery doesn't filter appropriately

查看:263
本文介绍了Lucene的RangeQuery不能适当地进行过滤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用 RangeQuery 来获取所有具有0到2
之间发言权量当我执行查询的文件,Lucene的给了我其中的文件有量超过2也越大。什么?我在这里缺少



下面是我的代码:

 期限lowerTerm =新词(量,MINAMOUNT); 
期限upperTerm =新词(量,maxAmount);

RangeQuery amountQuery =新RangeQuery(lowerTerm,upperTerm,真正的);

finalQuery.Add(amountQuery,BooleanClause.Occur.MUST);

和这里是进入我的索引:

  doc.Add(新域(量,amount.ToString(),Field.Store.YES,Field.Index.UN_TOKENIZED,Field.TermVector.YES)); 


解决方案

更新:像@ basZero在他的评论中说,开始使用Lucene 2.9,你可以添加的数字字段您的文档。不过,别忘了使用 NumericRangeQuery 而不是RangeQuery当您搜索。



原来的答复



Lucene的对待数字作为的话,那么他们的顺序是字母

  0 
1
12
123
2
22 $​​ b $ b

这意味着,对于Lucene的,12是0和2之间。如果你想要做一个适当的数值范围,则需要建立索引的数字零填充,然后做一个范围搜索[0000至技术领域的。 (填充你所需要的量取决于值的预期范围内)。



如果您有负数,只需添加一个零的非负数。 (编辑:错错错查看更新)



如果您的数字包括小数部分,离开它是和零垫。整数部分只有



例如:



<击>

  -00002.12 
-00001

  000000 
000001
000003.1415
000022

更新:负数是有点棘手,因为-1来之前-2字母。 本文提供了有关在Lucene中,一般用负数和数字打交道的完整说明。基本上,你必须编码数字到这是令项目的顺序是有意义的。


I'm using RangeQuery to get all the documents which have amount between say 0 to 2. When i execute the query, Lucene gives me documents which have amount greater than 2 also. What am I missing here?

Here is my code:

Term lowerTerm = new Term("amount", minAmount);
Term upperTerm = new Term("amount", maxAmount);

RangeQuery amountQuery = new RangeQuery(lowerTerm, upperTerm, true);

finalQuery.Add(amountQuery, BooleanClause.Occur.MUST);

and here is what goes into my index:

doc.Add(new Field("amount", amount.ToString(), Field.Store.YES, Field.Index.UN_TOKENIZED, Field.TermVector.YES));

解决方案

UPDATE: Like @basZero said in his comment, starting with Lucene 2.9, you can add numeric fields to your documents. Just remember to use NumericRangeQuery instead of RangeQuery when you search.

Original answer

Lucene treats numbers as words, so their order is alphabetic:

0
1
12
123
2
22

That means that for Lucene, 12 is between 0 and 2. If you want to do a proper numerical range, you need to index the numbers zero-padded, then do a range search of [0000 TO 0002]. (The amount of padding you need depends on the expected range of values).

If you have negative numbers, just add another zero for non-negative numbers. (EDIT: WRONG WRONG WRONG. See update)

If your numbers include a fraction part, leave it as is, and zero-pad the integer part only.

Example:

-00002.12
-00001

000000
000001
000003.1415
000022

UPDATE: Negative numbers are a bit tricky, since -1 comes before -2 alphabetically. This article gives a complete explanation about dealing with negative numbers and numbers in general in Lucene. Basically, you have to "encode" numbers into something that makes the order of the items make sense.

这篇关于Lucene的RangeQuery不能适当地进行过滤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆