使Lucene.Net线程在代码中安全 [英] Making Lucene.Net thread safe in the code

查看:94
本文介绍了使Lucene.Net线程在代码中安全的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Lucene.Net进行搜索,想知道如何处理该线程问题.

I am using Lucene.Net for Searching and wanted to know how I can handle this threading issue.

我有一个Test类的实例,但是在这种情况下搜索器不是线程安全的,因为计时器线程可以在处理请求的同时更新索引,因此我确实看到了异常.有关如何使其成为线程安全的任何指针.

I have a single instance of class Test, but the searcher is not threadsafe in this case, since the timer thread can update the index at the same time the request is served, and I do see exception due to that. Any pointers on how to make it thread safe.

public class Test 
{
    private static object syncObj = new object();

    private System.Threading.Timer timer;

    private Searcher searcher;

    private RAMDirectory idx = new RAMDirectory();

    public Test()
    {
        this.timer = new System.Threading.Timer(this.Timer_Elapsed, null, TimeSpan.Zero, TimeSpan.FromMinutes(3));
    }


    private Searcher ESearcher
    {
        get
        {
            return this.searcher;
        }

        set
        {
            lock (syncObj)
            {
                this.searcher = value;
            }
        }
    }

    public Document CreateDocument(string title, string content)
    {
        Document doc = new Document();
        doc.Add(new Field("A", title, Field.Store.YES, Field.Index.NO));
        doc.Add(new Field("B", content, Field.Store.YES, Field.Index.ANALYZED));
        return doc;
    }

    public List<Document> Search(Searcher searcher, string queryString)
    {
        List<Document> documents = new List<Document>();
        QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, "B", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30));
        Query query = parser.Parse(queryString);
        int hitsPerPage = 5;
        TopScoreDocCollector collector = TopScoreDocCollector.Create(2 * hitsPerPage, true);
        this.ESearcher.Search(query, collector);

        ScoreDoc[] hits = collector.TopDocs().ScoreDocs;

        int hitCount = collector.TotalHits > 10 ? 10 : collector.TotalHits;
        for (int i = 0; i < hitCount; i++)
        {
            ScoreDoc scoreDoc = hits[i];
            int docId = scoreDoc.Doc;
            float docScore = scoreDoc.Score;
            Document doc = searcher.Doc(docId);
            documents.Add(doc);
        }

        return documents;
    }

    private void Timer_Elapsed(object sender)
    {
        this.Log("Started Updating the Search Indexing");
        // Get New data to Index
        using (IndexWriter writer = new IndexWriter(this.idx, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30), true, IndexWriter.MaxFieldLength.LIMITED))
        {
            foreach (var e in es)
            {
                writer.AddDocument(this.CreateDocument(e.Value.ToString(), e.Key));
            }

            writer.Optimize();
        }

        this.ESearcher = new IndexSearcher(this.idx);
        this.Log("Completed Updating the Search Indexing");
    }

    public Result ServeRequest()
    {
        var documents = this.Search(this.EntitySearcher, searchTerm);
        //somelogic
        return result;

    }

}

推荐答案

很多事情与此不对".

如前所述,锁定并不安全(您需要锁定读取和写入).

As has been mentioned the locking wasn't safe (you need to lock reads as well as writes).

更重要的是,在Lucene中有更好的方法来处理此问题.首先,IndexWriter本身是线程安全的.它应该是Directory的所有者.打开/关闭目录的不同部分通常是不好的做法".

More significantly, there are better ways of handling this in Lucene. First, IndexWriter is itself threadsafe. It should be the owner of the Directory. It's generally "bad practice" to have different parts opening/closing the directory.

NRT(近实时)索引有一种样式,涉及从IW获取IndexReader,而不是包装目录.

There is a style for NRT (Near Real Time) indexes which involves getting an IndexReader from the IW, rather than wrapping the Directory.

如果索引本质上是只读的,并且可能每天/每周分批重新生成,则示例中使用的样式才是真正的好".

The style used in your example is only really "good" if the index is essentially read-only and maybe regenerated in batch daily/weekly etc.

我已经重写了示例以显示某些方法.显然,由于这只是测试代码,因此根据使用情况,有些细微之处需要重构/增强...

I have rewritten the example to show some of the approach. Obviously, as this is just test code there will be nuances that will need refactoring/enhancing depending on the use case...

public class Test
{
    private static object syncObj = new object();

    private System.Threading.Timer timer;

    private Searcher searcher;

    private IndexWriter writer;
    private IndexReader reader;

    public Test()
    {
        writer = new IndexWriter(new RAMDirectory(), new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30), true, IndexWriter.MaxFieldLength.LIMITED);
        reader = writer.GetReader();
        searcher = new IndexSearcher(reader);
        timer = new System.Threading.Timer(Timer_Elapsed, null, TimeSpan.Zero, TimeSpan.FromMinutes(3));
    }


    public void CreateDocument(string title, string content)
    {
        var doc = new Document();
        doc.Add(new Field("A", title, Field.Store.YES, Field.Index.NO));
        doc.Add(new Field("B", content, Field.Store.YES, Field.Index.ANALYZED));

        writer.AddDocument(doc);
    }

    public void ReplaceAll(Dictionary<string, string> es)
    {
        // pause timer
        timer.Change(Timeout.Infinite, Timeout.Infinite);

        writer.DeleteAll();
        foreach (var e in es)
        {
            AddDocument(e.Value.ToString(), e.Key);
        }

        // restart timer
        timer.Change(TimeSpan.Zero, TimeSpan.FromMinutes(3));
    }

    public List<Document> Search(string queryString)
    {
        var documents = new List<Document>();
        var parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, "B", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30));
        Query query = parser.Parse(queryString);
        int hitsPerPage = 5;
        var collector = TopScoreDocCollector.Create(2 * hitsPerPage, true);
        searcher.Search(query, collector);

        ScoreDoc[] hits = collector.TopDocs().ScoreDocs;

        int hitCount = collector.TotalHits > 10 ? 10 : collector.TotalHits;
        for (int i = 0; i < hitCount; i++)
        {
            ScoreDoc scoreDoc = hits[i];
            int docId = scoreDoc.Doc;
            float docScore = scoreDoc.Score;
            Document doc = searcher.Doc(docId);
            documents.Add(doc);
        }

        return documents;
    }

    private void Timer_Elapsed(object sender)
    {
        if (reader.IsCurrent())
            return;

        reader = writer.GetReader();
        var newSearcher = new IndexSearcher(reader);
        Interlocked.Exchange(ref searcher, newSearcher);
        Debug.WriteLine("Searcher updated");
    }

    public Result ServeRequest(string searchTerm)
    {
        var documents = Search(searchTerm);
        //somelogic
        var result = new Result();

        return result;

    }
}

注意:

  • 作者拥有"目录
  • 如果这是一个基于文件的目录,则您将具有OpenClose方法来创建/处理编写器(处理lock文件). RamDirectory可以通过GC进行
  • 使用Interlocked.Exchange而不是lock.因此,使用searcher成员时,成本为零(这是龙!)
  • 新文档直接添加到作者中
  • 如果未添加新文档,
  • IsCurrent()允许零成本.根据您添加文档的频率,您可能根本不需要计时器(只需调用Timer_Elapsed-显然已重命名-在Search的顶部).
  • 请勿使用Optimize(),这是以前版本的宿醉,并且强烈建议不要使用它(因为性能和磁盘I/O原因)
  • the writer "owns" the directory
  • if this was a file base Directory then you would have Open and Close methods to create/dispose the writer (which deals with handling the lock file). RamDirectory can just be GC'd
  • uses Interlocked.Exchange instead of lock. So zero cost when using the searcher member (here be dragons!)
  • new docs added directly to the writer
  • IsCurrent() allows for zero cost if no new docs have been added. Depending on how frequently you are adding docs, you may not need the timer at all (just call Timer_Elapsed - renamed obviously - at the top of Search).
  • don't use Optimize() it's a hangover from previous versions and it's use is highly discouraged (perf and disk I/O reasons)

最后,如果您使用的是Lucene.net v4.8,则应使用SearcherManager(在另一个答案中建议).但是,请使用接受IndexWriter的ctor并将其保留为单个"(与writer相同的作用域).它将处理锁定并为您吸引新读者.

Lastly, if you're using Lucene.net v4.8 then you should use SearcherManager (as suggested in another answer). But use the ctor that takes the IndexWriter and keep it as a "singleton" (same scope as writer). It will handle locking and getting new readers for you.

这篇关于使Lucene.Net线程在代码中安全的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆