在 Lucene 中,如何确定 IndexSearcher 或 IndexWriter 是否正在另一个线程中使用? [英] In Lucene, how can I find out if the IndexSearcher or IndexWriter is being used in another thread or not?

查看:18
本文介绍了在 Lucene 中,如何确定 IndexSearcher 或 IndexWriter 是否正在另一个线程中使用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Lucene 文档指出,IndexSearcher 和 IndexWriter 的单个实例应该用于整个应用程序中的每个索引,并跨所有线程使用.此外,在重新打开索引之前,对索引的写入将不可见.

Lucene documentation states that single instances of IndexSearcher and IndexWriter should be used for each index in the whole application, and across all threads. Also, writes to an index will not be visible until the index is re-opened.

所以,我正在尝试在多线程设置中遵循这些指南.(几个线程写入,多个用户线程搜索).我不想在每次更改时重新打开索引,而是希望搜索器实例不超过一定的时间(比如 20 秒).

So, I'm trying to follow these guides, in a multi-threaded setup. (a few threads writing, multiple user threads searching). I don't want to re-open the index on every change, rather, I want to keep searcher instance not older than a certain amount of time (say, like 20 seconds).

一个中心组件负责打开索引读取器和写入器,并保持单个实例和同步线程.我会跟踪任何用户线程上次访问 IndexSearcher 的时间,以及它变脏的时间.如果有人在更改后 20 秒后需要访问它,我想关闭搜索器并重新打开它.

A central component is responsible to open index readers and writers, and keep the single instance and synchronize the threads. I keep track of the last time the IndexSearcher has been accessed by any user thread, and the time it became dirty. If anyone needs to access it after 20 seconds has passed from the change, I want to close the searcher and re-open it.

问题是我不确定之前对搜索器的请求(由其他线程发出的)是否已经完成,所以我可以关闭 IndexSearcher.这意味着如果我关闭并重新打开在所有线程之间共享的单个 IndexSearcher 实例,则可能会在其他线程中同时进行搜索.

The problem is that I'm not sure of the previously requests for the searcher (made by other threads) has finished yet, so that I can close the IndexSearcher. It means that if I close and re-open the single IndexSearcher instance that is shared among all threads, there might be a search going on concurrently in some other thread.

更糟糕的是,理论上可能会发生以下情况:可以一直同时执行多个搜索.(假设您有成千上万的用户在同一个索引上运行搜索).单个 IndexSearcher 实例可能永远不会空闲,因此它可以被关闭.理想情况下,我想创建另一个 IndexSearcher 并将新请求定向到它(而旧的仍然打开并运行之前已经请求的搜索).当在旧实例上运行的搜索完成后,我想关闭它.

To make the matter worse, here's what can happen theoretically: there can be multiple searches being performed at the same time all the time. (suppose you have thousands of users running searches on the same index). The single IndexSearcher instance may never become free so that it can be closed. Ideally, I want to create another IndexSearcher and direct new requests to it (while the old one is still open and running the searches already requested before). When the searches running on the old instance are complete, I want to close it.

同步 IndexSearcher(或 IndexWriter)的多个用户调用 close() 方法的最佳方法是什么?Lucene 是否为此提供了任何功能/设施,或者应该完全由用户代码完成(例如使用搜索器计算线程,并在每次使用时增加/减少计数)?

What is the best way to synchronize multiple users of the IndexSearcher (or IndexWriter) for calling the close() method? Does Lucene provide any features / facilities for this, or it should be done totally by the user code (like counting the threads using a searcher, and increase / decrease the count each time it is used)?

对上述设计有什么建议/想法吗?

Are there any recommendation / ideas about the above mentioned design?

推荐答案

值得庆幸的是,在最近的版本(3.x 或 2.x 后期)中,他们添加了一个方法来告诉您在打开搜索器后是否有任何文字.IndexReader.isCurrent() 将告诉您自此阅读器打开后是否发生了任何更改.因此,您可能会创建一个简单的包装类来封装读取和写入,并且通过一些简单的同步,您可以提供 1 个类来管理所有线程之间的所有这些.

Thankfully in recent versions (3.x or late 2.x) they added a method to tell you if there has been any writing after the searcher had been opened. IndexReader.isCurrent() will tell you if any changes have occurred since this reader was open or not. So you probably will create a simple wrapper class that encapsulates both reading and writing, and with some simple synchronization you can provide 1 class that manages all of this between all of the threads.

这大概是我的工作:

  public class ArchiveIndex {
      private IndexSearcher search;
      private AtomicInteger activeSearches = new AtomicInteger(0);
      private IndexWriter writer;
      private AtomicInteger activeWrites = new AtomicInteger(0);

      public List<Document> search( ... ) {
          synchronized( this ) {
              if( search != null && !search.getIndexReader().isCurrent() && activeSearches.get() == 0 ) {
                 searcher.close();
                 searcher = null;
              }

              if( search == null ) {
                  searcher = new IndexSearcher(...);
              }
          }

          activeSearches.increment();
          try {
              // do you searching
          } finally {
              activeSearches.decrement();
          }
          // do you searching
      }


      public void addDocuments( List<Document> docs ) {
          synchronized( this ) {
             if( writer == null ) {
                 writer = new IndexWriter(...);
             }
          }
          try {
              activeWrites.incrementAndGet();
              // do you writes here.
          } finally {
              synchronized( this ) {
                  int writers = activeWrites.decrementAndGet();
                  if( writers == 0 ) {
                      writer.close();
                      writer = null;
                  }
              }
          }
      }
  }

所以我有一个课程可供读者和作者使用.注意这个类允许同时写和读,多个读者可以同时搜索.唯一的同步是快速检查是否需要重新打开搜索器/编写器.我没有在方法级别上进行同步,这一次只允许一个读取器/写入器,这在性能方面会很糟糕.如果那里有活跃的搜索者,则不能删除搜索者.因此,如果您有很多读者进来,只需简单地搜索而不进行更改.一旦它缩小,下一个单独的搜索器将重新打开脏搜索器.这对于流量会暂停的低流量站点可能非常有用.它仍然可能导致饥饿(即您总是在阅读越来越旧的结果).您可以添加逻辑来简单地停止并重新初始化,如果自从它被注意到脏的时间比 X 更早,否则我们就像现在一样懒惰.这样你就可以保证搜索永远不会早于 X.

So I have single class that I use for both readers and writers. Notice this class allows writing and reading at the same time, and multiple readers can search at the same time. The only sync'ing is the quick checks to see if you need to reopen the searcher/writer. I didn't synchronize on the method level which would only allow one reader/writer at a time which would be bad performance wise. If there are active searchers out there you can't drop the searcher. So if you get lots of readers coming in it just simply searches without the changes. Once it slims out the next lone searcher will reopen the dirty searcher. This might be great for lower volume sites where there will be a pause in traffic. It could still cause starvation (ie you're always reading older and older results). You could add logic to simply stop and reinitialize if the time since it was noticed dirty is older than X otherwise we lazy as it is now. That way you'll be guaranteed searches will never be older than X.

可以以同样的方式处理编写器.我倾向于记得定期关闭作者,以便读者会注意到它的变化(提交).我没有很好地描述它,但它的搜索方式大致相同.如果那里有活跃的作家,你不能关闭作家.如果你是最后一个出门的作家,请关闭作家.你明白了.

Writers can be handled much in the same way. I tend to remember closing the writer periodically so the reader will notice its changed (commit it). I didn't do a very good job describing that, but it's much the same way of searching. If there are active writers out there you can't close the writer. If you're the last writer out the door close the writer. You get the idea.

这篇关于在 Lucene 中,如何确定 IndexSearcher 或 IndexWriter 是否正在另一个线程中使用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆