在Lucene中,我如何知道IndexSearcher或IndexWriter是否正在另一个线程中使用? [英] In Lucene, how can I find out if the IndexSearcher or IndexWriter is being used in another thread or not?

查看:1435
本文介绍了在Lucene中,我如何知道IndexSearcher或IndexWriter是否正在另一个线程中使用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Lucene文档声明,IndexSearcher和IndexWriter的单个实例应该用于整个应用程序中的每个索引,并且用于所有线程。此外,在索引重新打开之前,对索引的写入将不可见。



因此,我想在多线程设置中遵循这些指南。 (几个线程写,多个用户线程搜索)。我不想在每次更改时重新打开索引,而是希望保持搜索器实例不会超过一定的时间(例如,20秒)。



中央组件负责打开索引阅读器和写入器,保留单个实例并同步线程。我记录上次IndexSearcher被任何用户线程访问的时间,以及它变脏的时间。如果任何人需要在更改后20秒后访问它,我想关闭搜索器并重新打开。



问题是我不在确保之前对搜索器的请求(由其他线程做出)已经完成,以便我可以关闭IndexSearcher。这意味着如果我关闭并重新打开在所有线程之间共享的单个IndexSearcher实例,那么在其他线程中可能会同时进行搜索。



使事情更糟,这里是理论上可以发生的:可以有多个搜索在同一时间执行。 (假设你有成千上万的用户在同一个索引上运行搜索)。单个IndexSearcher实例可能永远不会变为空闲,因此可以关闭它。理想情况下,我想创建另一个IndexSearcher并向它引导新的请求(而旧的仍然是打开并运行之前已经请求的搜索)。当在旧实例上运行的搜索完成后,我想关闭它。



同步IndexSearcher(或IndexWriter)的多个用户的最佳方法是调用close()方法? Lucene为此提供了任何功能/设施,或者它应该完全由用户代码完成(如使用搜索器计数线程,并在每次使用时增加/减少计数)?



有关上述设计的任何建议/想法吗?

解决方案

.x或2.x之后),他们添加了一种方法,告诉您搜索者打开后是否有任何写入。 IndexReader.isCurrent()将告诉你自从这个读者是否打开以来发生了任何更改。所以你可能会创建一个封装读写的简单的包装类,通过一些简单的同步,你可以提供一个类来管理所有这些线程之间的所有这些。



这里大概是我做的:

  public class ArchiveIndex {
private IndexSearcher search;
private AtomicInteger activeSearches = new AtomicInteger(0);
private IndexWriter writer;
private AtomicInteger activeWrites = new AtomicInteger(0);

public List< Document> search(...){
synchronized(this){
if(search!= null&&!search.getIndexReader()。isCurrent()&& activeSearches.get = 0){
searcher.close();
searcher = null;
}

if(search == null){
searcher = new IndexSearcher(...);
}
}

activeSearches.increment();
try {
//你搜索
} finally {
activeSearches.decrement();
}
//你在搜索
}


public void addDocuments(List< Document> docs){
synchronized {
if(writer == null){
writer = new IndexWriter(...);
}
}
try {
activeWrites.incrementAndGet();
//你写这里。
} finally {
synchronized(this){
int writers = activeWrites.decrementAndGet();
if(writers == 0){
writer.close();
writer = null;
}
}
}
}
}


$ b b

所以我有一个单独的类,我用于读者和作家。注意这个类允许同时进行写和读,多个读者可以同时搜索。唯一的同步是快速检查,看看是否需要重新打开搜索者/作者。我没有在方法级别上同步,这将只允许一次读取器/写入器将是坏的性能明智的。如果有活跃的搜索者在那里你不能删除搜索者。所以如果你有很多读者来,只是简单的搜索没有变化。一旦它瘦了下一个孤独的搜索者将重新打开肮脏的搜索。这可能适用于流量较少的网站,因为流量会暂停。它仍然可能导致饥饿(即你总是阅读旧的和较旧的结果)。你可以添加逻辑来简单地停止和重新初始化,如果自从它被察觉脏的时间比X更旧,否则我们懒惰,因为它是现在。这样,你将保证搜索永远不会比X更早。



写者可以以同样的方式处理。我倾向于记住定期关闭作家,所以读者会注意到它的改变(提交)。我没有做一个很好的工作描述,但它是以同样的方式搜索。如果有活跃的作家,你不能关闭作家。如果你是最后一位作家关闭作家。你得到了想法。


Lucene documentation states that single instances of IndexSearcher and IndexWriter should be used for each index in the whole application, and across all threads. Also, writes to an index will not be visible until the index is re-opened.

So, I'm trying to follow these guides, in a multi-threaded setup. (a few threads writing, multiple user threads searching). I don't want to re-open the index on every change, rather, I want to keep searcher instance not older than a certain amount of time (say, like 20 seconds).

A central component is responsible to open index readers and writers, and keep the single instance and synchronize the threads. I keep track of the last time the IndexSearcher has been accessed by any user thread, and the time it became dirty. If anyone needs to access it after 20 seconds has passed from the change, I want to close the searcher and re-open it.

The problem is that I'm not sure of the previously requests for the searcher (made by other threads) has finished yet, so that I can close the IndexSearcher. It means that if I close and re-open the single IndexSearcher instance that is shared among all threads, there might be a search going on concurrently in some other thread.

To make the matter worse, here's what can happen theoretically: there can be multiple searches being performed at the same time all the time. (suppose you have thousands of users running searches on the same index). The single IndexSearcher instance may never become free so that it can be closed. Ideally, I want to create another IndexSearcher and direct new requests to it (while the old one is still open and running the searches already requested before). When the searches running on the old instance are complete, I want to close it.

What is the best way to synchronize multiple users of the IndexSearcher (or IndexWriter) for calling the close() method? Does Lucene provide any features / facilities for this, or it should be done totally by the user code (like counting the threads using a searcher, and increase / decrease the count each time it is used)?

Are there any recommendation / ideas about the above mentioned design?

解决方案

Thankfully in recent versions (3.x or late 2.x) they added a method to tell you if there has been any writing after the searcher had been opened. IndexReader.isCurrent() will tell you if any changes have occurred since this reader was open or not. So you probably will create a simple wrapper class that encapsulates both reading and writing, and with some simple synchronization you can provide 1 class that manages all of this between all of the threads.

Here is roughly what I do:

  public class ArchiveIndex {
      private IndexSearcher search;
      private AtomicInteger activeSearches = new AtomicInteger(0);
      private IndexWriter writer;
      private AtomicInteger activeWrites = new AtomicInteger(0);

      public List<Document> search( ... ) {
          synchronized( this ) {
              if( search != null && !search.getIndexReader().isCurrent() && activeSearches.get() == 0 ) {
                 searcher.close();
                 searcher = null;
              }

              if( search == null ) {
                  searcher = new IndexSearcher(...);
              }
          }

          activeSearches.increment();
          try {
              // do you searching
          } finally {
              activeSearches.decrement();
          }
          // do you searching
      }


      public void addDocuments( List<Document> docs ) {
          synchronized( this ) {
             if( writer == null ) {
                 writer = new IndexWriter(...);
             }
          }
          try {
              activeWrites.incrementAndGet();
              // do you writes here.
          } finally {
              synchronized( this ) {
                  int writers = activeWrites.decrementAndGet();
                  if( writers == 0 ) {
                      writer.close();
                      writer = null;
                  }
              }
          }
      }
  }

So I have single class that I use for both readers and writers. Notice this class allows writing and reading at the same time, and multiple readers can search at the same time. The only sync'ing is the quick checks to see if you need to reopen the searcher/writer. I didn't synchronize on the method level which would only allow one reader/writer at a time which would be bad performance wise. If there are active searchers out there you can't drop the searcher. So if you get lots of readers coming in it just simply searches without the changes. Once it slims out the next lone searcher will reopen the dirty searcher. This might be great for lower volume sites where there will be a pause in traffic. It could still cause starvation (ie you're always reading older and older results). You could add logic to simply stop and reinitialize if the time since it was noticed dirty is older than X otherwise we lazy as it is now. That way you'll be guaranteed searches will never be older than X.

Writers can be handled much in the same way. I tend to remember closing the writer periodically so the reader will notice its changed (commit it). I didn't do a very good job describing that, but it's much the same way of searching. If there are active writers out there you can't close the writer. If you're the last writer out the door close the writer. You get the idea.

这篇关于在Lucene中,我如何知道IndexSearcher或IndexWriter是否正在另一个线程中使用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆