可搜索索引在手动更新时锁定(LockObtainFailedException) [英] Searchable index gets locked on manual update (LockObtainFailedException)

查看:181
本文介绍了可搜索索引在手动更新时锁定(LockObtainFailedException)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个Grails项目在负载均衡器后面运行。有三个Grails应用程序在服务器上运行(使用单独的Tomcat实例)。每个实例都有其自己的可搜索索引。因为索引是分开的,所以自动更新不足以保持应用程序实例之间的索引一致。因此,我们禁用了可搜索的索引镜像,并且更新索引是在预定的石英工作中手动完成的。根据我们的理解,应用程序的其他部分不应修改索引。

石英作业每分钟运行一次,并从数据库检查应用程序更新哪些行,并重新索引这些对象。作业还会检查相同作业是否已在运行,因此它不会执行任何并发索引。该应用程序在启动后的几个小时内运行良好,然后突然在作业开始时引发LockObtainFailedException:
$ b


22.10.2012 11: 20:40 [xxxx.ReindexJob]错误无法更新可搜索的索引,类org.compass.core.engine.SearchEngineException:
无法为子索引[product]打开编写器;嵌套异常是
org.apache.lucene.store.LockObtainFailedException:Lock获取定时
出:
SimpleFSLock @ / home / xxx / tomcat / searchable-index / index / product / lucene -a7bbc72a49512284f5ac54f5d7d32849 -write.lock根据上次执行作业的日志,重新建立索引没有任何错误,工作完成成功完成。但是,这次重新索引操作抛出了锁定异常,就好像之前的操作未完成并且锁定尚未释放。直到重新启动应用程序才会释放锁。



我们试图通过手动打开锁定索引来解决问题,这会导致将以下错误打印到日志:


22.10.2012 11:21:30 [manager.IndexWritersManager]错误非法状态,将索引编写器标记为open,而另一个被标记为
为子索引打开[产品]


在此之后,工作似乎正常工作,不再次陷入锁定状态。但是,这会导致应用程序不断使用100%的CPU资源。以下是石英工作代码的缩短版本。



任何帮助将不胜感激解决问题,提前致谢。

  class ReindexJob {

def compass
...

static lastIndexed

static triggers = {
//每天每分钟(在xx:xx:30),启动延迟2分钟
// cronExpression:smh DMW [Y]
cron名称:ReindexTrigger,cronExpression:30 * * * *?,startDelay:120000
}

def execute(){
if(ConcurrencyHelper.isLocked(ConcurrencyHelper。 Locks.LUCENE_INDEX)){
log.error(搜索索引已被锁定,不做任何事情。)
return
}

try {
boolean acquiredLock = ConcurrencyHelper.lock(ConcurrencyHelper.Locks.LUCENE_INDEX,ReindexJob)
if(!acquiredLock){
log.warn(无法锁定搜索索引,不做任何事情。)
返回
}

日历reindexDate = lastIndexed
日历newReindexDate = Calendar.instance
if(!reindexDate){
reindexDate = Calendar.instance
reindexDate.add(Calendar.MINUTE,-3)
lastIndexed = reindexDate
}

log.debug(+++ Starting ReindexJob,last indexed $ {TextHelper.formatDate (yyyy-MM-dd HH:mm:ss,reindexDate.time)} +++)
长启动= System.currentTimeMillis()

字符串reindexMessage =

//检索自上次作业以来已修改的产品的ID
String productQuery =从产品中选择p.id ...
List< Long> productIds = Product.executeQuery(productQuery,[lastIndexedDate:reindexDate.time,lastIndexedCalendar:reindexDate])

if(productIds){
reindexMessage + =Found $ {productIds。 size()} product tos reindex。

final int BATCH_SIZE = 10
Long time = TimeHelper.timer {
for(int inserted = 0; inserted< productIds.size();插入+ = BATCH_SIZE){
log.debug(从{插入的+1}到$ {Math.min(插入的+ BATCH_SIZE,productIds.size())}:$ { productIds.subList(插入,Math.min(插入+ BATCH_SIZE,productIds.size()))})
Product.reindex(productIds.subList(插入的,Math.min(插入的+ BATCH_SIZE,productIds.size ))))
Thread.sleep(250)
}
}

reindexMessage + =($ {time / 1000} s)。
}其他{
reindexMessage + =没有产品重新索引。


log.debug(reindexMessage)

//重新索引品牌
Brand.reindex()

lastIndexed = newReindexDate

log.debug(+++ Finished ReindexJob($ {(System.currentTimeMillis() - start)/ 1000} s)+++)
} catch (例外e){
log.error(无法更新可搜索索引,$ {e.class}:$ {e.message})
if(e instanceof org.apache.lucene.store .LockObtainFailedException || e instanceof org.compass.core.engine.SearchEngineException){
log.info(这是一个Lucene索引锁定异常。)
for(String subIndex in compass.searchEngineIndexManager.getSubIndexes ()){
if(compass.searchEngineIndexManager.isLocked(subIndex)){
log.info(发布针对子索引$ {subIndex}的Lucene索引锁)
compass.searchEngineIndexManager。 releaseLock(subIndex)
}
}
}
}最后{
ConcurrencyHelper.unlock(ConcurrencyHelper.Locks.LUCENE_INDEX,ReindexJob)
}
}
}

基于JMX CPU示例,看起来Compass正在幕后进行一些调度。从1分钟的CPU样本看来,当比较正常和100%的CPU实例时,看起来有几点不同:


  • org.apache.lucene .index.IndexWriter.doWait()正在使用大部分CPU时间。

  • Compass Scheduled Executor线程显示在线程列表中,这在正常情况下不会出现。
  • >
  • 一个Compass Executor线程正在执行commitMerge,在正常情况下,这些线程中没有一个正在执行commitMerge。
  • >解决方案

您可以尝试增加'compass.transaction.lockTimeout'设置。默认值是10(秒)。 另外一个选择是在Compass中禁用并发并使其同步。这是通过'compass.transaction.processor.read_committed.concurrentOperations':'false'设置来控制的。您可能还必须将'compass.transaction.processor'设置为'read_committed'



这些是我们目前使用的指南针设置:

  compassSettings = [
'compass.engine.optimizer.schedule.period':'300',
'compass.engine.mergeFactor' :'1000',
'compass.engine.maxBufferedDocs':'1000',
'compass.engine.ramBufferSize':'128',
'compass.engine.useCompoundFile':' false',
'compass.transaction.processor':'read_committed',
'compass.transaction.processor.read_committed.concurrentOperations':'false',
'compass.transaction.lockTimeout' :'30',
'compass.transaction.lockPollInterval':'500',
'compass.transaction.readCommitted.translog.connection':'ram://'
]

这关闭了并发。通过将'compass.transaction.processor.read_committed.concurrentOperations'设置更改为'true',可以使其成为异步。 (或删除条目)。



指南针配置参考:
http://static.compassframework.org/docs/latest/core-configuration.html



文档对于read_committed处理器的并发性:
http://www.compass-project.org/docs/latest/reference/html/core-searchengine.html#core-searchengine-transaction-read_committed



如果要保留异步操作,还可以控制它使用的线程数。使用compass.transaction.processor.read_committed.concurrencyLevel = 1设置将允许异步操作,但只使用一个线程(默认为5个线程)。还有compass.transaction.processor.read_committed.backlog和compass.transaction.processor.read_committed.addTimeout设置。



我希望这有助于。


We have a Grails project that runs behind a load balancer. There are three instances of the Grails application running on the server (using separate Tomcat instances). Each instance has its own searchable index. Because the indexes are separate, the automatic update is not enough keeping the index consistent between the application instances. Because of this we have disabled the searchable index mirroring and updates to the index are done manually in a scheduled quartz job. According to our understanding no other part of the application should modify the index.

The quartz job runs once a minute and it checks from the database which rows have been updated by the application, and re-indexes those objects. The job also checks if the same job is already running so it doesn’t do any concurrent indexing. The application runs fine for few hours after the startup and then suddenly when the job is starting, LockObtainFailedException is thrown:

22.10.2012 11:20:40 [xxxx.ReindexJob] ERROR Could not update searchable index, class org.compass.core.engine.SearchEngineException: Failed to open writer for sub index [product]; nested exception is org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SimpleFSLock@/home/xxx/tomcat/searchable-index/index/product/lucene-a7bbc72a49512284f5ac54f5d7d32849-write.lock

According to the log the last time the job was executed, re-indexing was done without any errors and the job finished successfully. Still, this time the re-index operation throws the locking exception, as if the previous operation was unfinished and the lock had not been released. The lock will not be released until the application is restarted.

We tried to solve the problem by manually opening the locked index, which causes the following error to be printed to the log:

22.10.2012 11:21:30 [manager.IndexWritersManager ] ERROR Illegal state, marking an index writer as open, while another is marked as open for sub index [product]

After this the job seems to be working correctly and doesn’t become stuck in a locked state again. However this causes the application to constantly use 100 % of the CPU resource. Below is a shortened version of the quartz job code.

Any help would be appreciated to solve the problem, thanks in advance.

class ReindexJob {

def compass
...

static Calendar lastIndexed

static triggers = {
    // Every day every minute (at xx:xx:30), start delay 2 min
    // cronExpression:                           "s  m h D M W [Y]"
    cron name: "ReindexTrigger", cronExpression: "30 * * * * ?", startDelay: 120000
}

def execute() {
    if (ConcurrencyHelper.isLocked(ConcurrencyHelper.Locks.LUCENE_INDEX)) {
        log.error("Search index has been locked, not doing anything.")
        return
    }

    try {
        boolean acquiredLock = ConcurrencyHelper.lock(ConcurrencyHelper.Locks.LUCENE_INDEX, "ReindexJob")
        if (!acquiredLock) {
            log.warn("Could not lock search index, not doing anything.")
            return
        }

        Calendar reindexDate = lastIndexed
        Calendar newReindexDate = Calendar.instance
        if (!reindexDate) {
            reindexDate = Calendar.instance
            reindexDate.add(Calendar.MINUTE, -3)
            lastIndexed = reindexDate
        }

        log.debug("+++ Starting ReindexJob, last indexed ${TextHelper.formatDate("yyyy-MM-dd HH:mm:ss", reindexDate.time)} +++")
        Long start = System.currentTimeMillis()

        String reindexMessage = ""

        // Retrieve the ids of products that have been modified since the job last ran
        String productQuery = "select p.id from Product ..."
        List<Long> productIds = Product.executeQuery(productQuery, ["lastIndexedDate": reindexDate.time, "lastIndexedCalendar": reindexDate])

        if (productIds) {
            reindexMessage += "Found ${productIds.size()} product(s) to reindex. "

            final int BATCH_SIZE = 10
            Long time = TimeHelper.timer {
                for (int inserted = 0; inserted < productIds.size(); inserted += BATCH_SIZE) {
                    log.debug("Indexing from ${inserted + 1} to ${Math.min(inserted + BATCH_SIZE, productIds.size())}: ${productIds.subList(inserted, Math.min(inserted + BATCH_SIZE, productIds.size()))}")
                    Product.reindex(productIds.subList(inserted, Math.min(inserted + BATCH_SIZE, productIds.size())))
                    Thread.sleep(250)
                }
            }

            reindexMessage += " (${time / 1000} s). "
        } else {
            reindexMessage += "No products to reindex. "
        }

        log.debug(reindexMessage)

        // Re-index brands
        Brand.reindex()

        lastIndexed = newReindexDate

        log.debug("+++ Finished ReindexJob (${(System.currentTimeMillis() - start) / 1000} s) +++")
    } catch (Exception e) {
        log.error("Could not update searchable index, ${e.class}: ${e.message}")
        if (e instanceof org.apache.lucene.store.LockObtainFailedException || e instanceof org.compass.core.engine.SearchEngineException) {
            log.info("This is a Lucene index locking exception.")
            for (String subIndex in compass.searchEngineIndexManager.getSubIndexes()) {
                if (compass.searchEngineIndexManager.isLocked(subIndex)) {
                    log.info("Releasing Lucene index lock for sub index ${subIndex}")
                    compass.searchEngineIndexManager.releaseLock(subIndex)
                }
            }
        }
    } finally {
        ConcurrencyHelper.unlock(ConcurrencyHelper.Locks.LUCENE_INDEX, "ReindexJob")
    }
}
}

Based on JMX CPU samples, it seems that Compass is doing some scheduling behind the scenes. From 1 minute CPU samples it seems like there are few things different when normal and 100% CPU instances are compared:

  • org.apache.lucene.index.IndexWriter.doWait() is using most of the CPU time.
  • Compass Scheduled Executor Thread is shown in the thread list, this was not seen in a normal situation.
  • One Compass Executor Thread is doing commitMerge, in a normal situation none of these threads was doing commitMerge.

解决方案

You can try increasing the 'compass.transaction.lockTimeout' setting. The default is 10 (seconds).

Another option is to disable concurrency in Compass and make it synchronous. This is controlled with the 'compass.transaction.processor.read_committed.concurrentOperations': 'false' setting. You might also have to set 'compass.transaction.processor' to 'read_committed'

These are the compass settings we are currently using:

compassSettings = [
'compass.engine.optimizer.schedule.period': '300',
'compass.engine.mergeFactor':'1000',
'compass.engine.maxBufferedDocs':'1000',
'compass.engine.ramBufferSize': '128',
'compass.engine.useCompoundFile': 'false',
'compass.transaction.processor': 'read_committed',
'compass.transaction.processor.read_committed.concurrentOperations': 'false',
'compass.transaction.lockTimeout': '30',
'compass.transaction.lockPollInterval': '500',
'compass.transaction.readCommitted.translog.connection': 'ram://'
]

This has concurrency switched off. You can make it asynchronous by changing the 'compass.transaction.processor.read_committed.concurrentOperations' setting to 'true'. (or removing the entry).

Compass configuration reference: http://static.compassframework.org/docs/latest/core-configuration.html

Documentation for the concurrency of read_committed processor: http://www.compass-project.org/docs/latest/reference/html/core-searchengine.html#core-searchengine-transaction-read_committed

If you want to keep async operations, you can also control the number of threads it uses. Using compass.transaction.processor.read_committed.concurrencyLevel=1 setting would allow asynchronous operations but just use one thread (the default is 5 threads). There are also the compass.transaction.processor.read_committed.backlog and compass.transaction.processor.read_committed.addTimeout settings.

I hope this helps.

这篇关于可搜索索引在手动更新时锁定(LockObtainFailedException)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆