当我在创建时不知道最大大小时,如何使用Lucene的PriorityQueue? [英] How can I use Lucene's PriorityQueue when I don't know the max size at create time?

查看:78
本文介绍了当我在创建时不知道最大大小时,如何使用Lucene的PriorityQueue?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我为Lucene.Net构建了一个自定义收集器,但是我不知道如何排序(或分页)结果.每次调用Collect时,我都可以将结果添加到内部PriorityQueue中,我知道这是执行此操作的正确方法.

I built a custom collector for Lucene.Net, but I can't figure out how to order (or page) the results. Everytime Collect gets called, I can add the result to an internal PriorityQueue, which I understand is the correct way to do this.

我扩展了PriorityQueue,但是它在创建时需要一个size参数.您必须在构造函数中调用Initialize并传入最大大小.

I extended the PriorityQueue, but it requires a size parameter on creation. You have to call Initialize in the constructor and pass in the max size.

但是,在收集器中,搜索器只是在获取新结果时调用Collect,所以我不知道创建PriorityQueue时有多少结果.基于此,我不知道如何使PriorityQueue工作.

However, in a collector, the searcher just calls Collect when it gets a new result, so I don't know how many results I have when I create the PriorityQueue. Based on this, I can't figure out how to make the PriorityQueue work.

我意识到我可能在这里错过了一些简单的事情...

I realize I'm probably missing something simple here...

推荐答案

PriorityQueue不是SortedListSortedDictionary. 这是一种排序实现,它返回N个元素的前M个结果(您的PriorityQueue的大小).您可以根据需要添加InsertWithOverflow任意数量的项,但它仅包含前M个元素.

PriorityQueue is not SortedList or SortedDictionary. It is a kind of sorting implementation where it returns the top M results(your PriorityQueue's size) of N elements. You can add with InsertWithOverflow as many items as you want, but it will only hold only the top M elements.

假设您的搜索结果是1000000次匹配.您会将所有结果返回给用户吗? 更好的方法是将前10个元素返回给用户(使用PriorityQueue(10)),然后 如果用户要求下一个10个结果,则可以使用PriorityQueue( 20 )进行新搜索,并返回下一个 10 元素,依此类推. 这是大多数搜索引擎(如Google)使用的技巧.

Suppose your search resulted in 1000000 hits. Would you return all of the results to user? A better way would be to return the top 10 elements to the user(using PriorityQueue(10)) and if the user requests for the next 10 result, you can make a new search with PriorityQueue(20) and return the next 10 elements and so on. This is the trick most search engines like google uses.

Everytime Commit gets called, I can add the result to an internal PriorityQueue.

我无法理解Commitsearch之间的关系,因此,我将附加PriorityQueue的示例用法:

I can not undestand the relationship between Commit and search, Therefore I will append a sample usage of PriorityQueue:

public class CustomQueue : Lucene.Net.Util.PriorityQueue<Document>
{
    public CustomQueue(int maxSize): base()
    {
        Initialize(maxSize);
    }

    public override bool LessThan(Document a, Document b)
    {
        //a.GetField("field1")
        //b.GetField("field2");
        return  //compare a & b
    }
}

public class MyCollector : Lucene.Net.Search.Collector
{
    CustomQueue _queue = null;
    IndexReader _currentReader;

    public MyCollector(int maxSize)
    {
        _queue = new CustomQueue(maxSize);
    }

    public override bool AcceptsDocsOutOfOrder()
    {
        return true;
    }

    public override void Collect(int doc)
    {
        _queue.InsertWithOverflow(_currentReader.Document(doc));
    }

    public override void SetNextReader(IndexReader reader, int docBase)
    {
        _currentReader = reader;
    }

    public override void SetScorer(Scorer scorer)
    {
    }
}


searcher.Search(query,new MyCollector(10)) //First page.
searcher.Search(query,new MyCollector(20)) //2nd page.
searcher.Search(query,new MyCollector(30)) //3rd page.

编辑@nokturnal

public class MyPriorityQueue<TObj, TComp> : Lucene.Net.Util.PriorityQueue<TObj>
                                where TComp : IComparable<TComp>
{
    Func<TObj, TComp> _KeySelector;

    public MyPriorityQueue(int size, Func<TObj, TComp> keySelector) : base()
    {
        _KeySelector = keySelector;
        Initialize(size);
    }

    public override bool LessThan(TObj a, TObj b)
    {
        return _KeySelector(a).CompareTo(_KeySelector(b)) < 0;
    }

    public IEnumerable<TObj> Items
    {
        get
        {
            int size = Size();
            for (int i = 0; i < size; i++)
                yield return Pop();
        }
    }
}


var pq = new MyPriorityQueue<Document, string>(3, doc => doc.GetField("SomeField").StringValue);
foreach (var item in pq.Items)
{
}

这篇关于当我在创建时不知道最大大小时,如何使用Lucene的PriorityQueue?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆