使用Lucene计算类别中的结果 [英] Using Lucene to count results in categories

查看:150
本文介绍了使用Lucene计算类别中的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Lucene Java 2.3.2在产品目录上实现搜索。除了产品的常规字段外,还有一个名为类别的字段。产品可以分为多个类别。目前,我使用FilteredQuery搜索每个类别的相同搜索词,以获得每个类别的结果数。

I am trying to use Lucene Java 2.3.2 to implement search on a catalog of products. Apart from the regular fields for a product, there is field called 'Category'. A product can fall in multiple categories. Currently, I use FilteredQuery to search for the same search term with every Category to get the number of results per category.

这导致每个查询20-30个内部搜索调用显示结果。这大大减慢了搜索速度。有没有更快的方法来使用Lucene实现相同的结果?

This results in 20-30 internal search calls per query to display the results. This is slowing down the search considerably. Is there a faster way of achieving the same result using Lucene?

推荐答案

这就是我所做的,尽管它对内存有点沉重:

Here's what I did, though it's a bit heavy on memory:

你需要的是提前创建一堆 BitSet s,每个类别一个,包含类别中所有文档的doc id 。现在,在搜索时,你使用 HitCollector 并检查针对BitSets的文档ID。

What you need is to create in advance a bunch of BitSets, one for each category, containing the doc id of all the documents in a category. Now, on search time you use a HitCollector and check the doc ids against the BitSets.

以下是创建位集的代码:

Here's the code to create the bit sets:

public BitSet[] getBitSets(IndexSearcher indexSearcher, 
                           Category[] categories) {
    BitSet[] bitSets = new BitSet[categories.length];
    for(int i=0; i<categories.length; i++)
    {
        Query query = categories[i].getQuery();
        final BitSet bitset = new BitSet()
        indexSearcher.search(query, new HitCollector() {
            public void collect(int doc, float score) {
                bitSet.set(doc);
            }
        });
        bitSets[i] = bitSet;
    }
    return bitSets;
}

这只是一种方法。您可以使用 TermDocs 如果您的类别足够简单,而不是运行完整搜索,但这应该只在您加载索引时运行一次。

This is just one way to do this. You could probably use TermDocs instead of running a full search if your categories are simple enough, but this should only run once when you load the index anyway.

现在,当它的时间要计算搜索结果的类别,请执行以下操作:

Now, when it's time to count categories of search results you do this:

public int[] getCategroryCount(IndexSearcher indexSearcher, 
                               Query query, 
                               final BitSet[] bitSets) {
    final int[] count = new int[bitSets.length];
    indexSearcher.search(query, new HitCollector() {
        public void collect(int doc, float score) {
            for(int i=0; i<bitSets.length; i++) {
                if(bitSets[i].get(doc)) count[i]++;
            }
        }
    });
    return count;
}

你最终得到的是一个数组,其中包含了每个类别的数量。搜索结果。如果您还需要搜索结果,则应该向命中收集器添加TopDocCollector(yo dawg ...)。或者,您可以再次运行搜索。 2次搜索优于30次。

What you end up with is an array containing the count of every category within the search results. If you also need the search results, you should add a TopDocCollector to your hit collector (yo dawg...). Or, you could just run the search again. 2 searches are better than 30.

这篇关于使用Lucene计算类别中的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆