Lucene的RangeQuery不能适当地进行过滤 [英] Lucene RangeQuery doesn't filter appropriately
问题描述
我用 RangeQuery
来获取所有具有0到2
之间发言权量当我执行查询的文件,Lucene的给了我其中的文件有量超过2也越大。什么?我在这里缺少
下面是我的代码:
期限lowerTerm =新词(量,MINAMOUNT);
期限upperTerm =新词(量,maxAmount);
RangeQuery amountQuery =新RangeQuery(lowerTerm,upperTerm,真正的);
finalQuery.Add(amountQuery,BooleanClause.Occur.MUST);
和这里是进入我的索引:
doc.Add(新域(量,amount.ToString(),Field.Store.YES,Field.Index.UN_TOKENIZED,Field.TermVector.YES));
更新:像@ basZero在他的评论中说,开始使用Lucene 2.9,你可以添加的数字字段您的文档。不过,别忘了使用 NumericRangeQuery 而不是RangeQuery当您搜索。
原来的答复
Lucene的对待数字作为的话,那么他们的顺序是字母
0
1
12
123
2
22 $ b $ b
这意味着,对于Lucene的,12是0和2之间。如果你想要做一个适当的数值范围,则需要建立索引的数字零填充,然后做一个范围搜索[0000至技术领域的。 (填充你所需要的量取决于值的预期范围内)。
如果您有负数,只需添加一个零的非负数。 (编辑:错错错查看更新)
如果您的数字包括小数部分,离开它是和零垫。整数部分只有
例如:
<击>
-00002.12
-00001
罢工>
000000
000001
000003.1415
000022
更新:负数是有点棘手,因为-1来之前-2字母。 本文提供了有关在Lucene中,一般用负数和数字打交道的完整说明。基本上,你必须编码数字到这是令项目的顺序是有意义的。
I'm using RangeQuery
to get all the documents which have amount between say 0 to 2.
When i execute the query, Lucene gives me documents which have amount greater than 2 also. What am I missing here?
Here is my code:
Term lowerTerm = new Term("amount", minAmount);
Term upperTerm = new Term("amount", maxAmount);
RangeQuery amountQuery = new RangeQuery(lowerTerm, upperTerm, true);
finalQuery.Add(amountQuery, BooleanClause.Occur.MUST);
and here is what goes into my index:
doc.Add(new Field("amount", amount.ToString(), Field.Store.YES, Field.Index.UN_TOKENIZED, Field.TermVector.YES));
UPDATE: Like @basZero said in his comment, starting with Lucene 2.9, you can add numeric fields to your documents. Just remember to use NumericRangeQuery instead of RangeQuery when you search.
Original answer
Lucene treats numbers as words, so their order is alphabetic:
0
1
12
123
2
22
That means that for Lucene, 12 is between 0 and 2. If you want to do a proper numerical range, you need to index the numbers zero-padded, then do a range search of [0000 TO 0002]. (The amount of padding you need depends on the expected range of values).
If you have negative numbers, just add another zero for non-negative numbers. (EDIT: WRONG WRONG WRONG. See update)
If your numbers include a fraction part, leave it as is, and zero-pad the integer part only.
Example:
-00002.12
-00001
000000
000001
000003.1415
000022
UPDATE: Negative numbers are a bit tricky, since -1 comes before -2 alphabetically. This article gives a complete explanation about dealing with negative numbers and numbers in general in Lucene. Basically, you have to "encode" numbers into something that makes the order of the items make sense.
这篇关于Lucene的RangeQuery不能适当地进行过滤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!