Lucene上打开文件太多错误 [英] Too many open files Error on Lucene

查看:152
本文介绍了Lucene上打开文件太多错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究的项目是索引一定数量的数据(带有长文本),并将它们与每个间隔的单词列表(大约15到30分钟)进行比较。

The project I'm working on is indexing a certain number of data (with long texts) and comparing them with list of words per interval (about 15 to 30 minutes).

过了一段时间,比如第35轮,在第36轮开始索引新的数据集时发生了这个错误:

After some time, say 35th round, while starting to index new set of data on 36th round this error occurred:

    [ERROR] (2011-06-01 10:08:59,169) org.demo.service.LuceneService.countDocsInIndex(?:?) : Exception on countDocsInIndex: 
    java.io.FileNotFoundException: /usr/share/demo/index/tag/data/_z.tvd (Too many open files)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
        at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.<init>(SimpleFSDirectory.java:69)
        at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.<init>(SimpleFSDirectory.java:90)
        at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.<init>(NIOFSDirectory.java:91)
        at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78)
        at org.apache.lucene.index.TermVectorsReader.<init>(TermVectorsReader.java:81)
        at org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:299)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:580)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:556)
        at org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:113)
        at org.apache.lucene.index.ReadOnlyDirectoryReader.<init>(ReadOnlyDirectoryReader.java:29)
        at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:81)
        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:736)
        at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:428)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:274)
        at org.demo.service.LuceneService.countDocsInIndex(Unknown Source)
        at org.demo.processing.worker.DataFilterWorker.indexTweets(Unknown Source)
        at org.demo.processing.worker.DataFilterWorker.processTweets(Unknown Source)
        at org.demo.processing.worker.DataFilterWorker.run(Unknown Source)
        at java.lang.Thread.run(Thread.java:636)

我已经尝试通过以下方式设置最大打开文件数:

I've already tried setting maximum number of open files by:

        ulimit -n <number>

但是一段时间后,当间隔有大约1050行长文本时,会发生同样的错误。但它只发生过一次。

But after some time, when the interval has about 1050 rows of long texts, the same error occurs. But it only occurred once.

我应该遵循从(打开的文件过多) - SOLR 或这是一个关于被索引的数据量的问题?

Should I follow the advice of modifying Lucene IndexWriter's mergeFactor from (Too many open files) - SOLR or is this an issue on the amount of data being indexed?

我还读到它是批量索引或交互式索引之间的选择。
如何通过频繁更新来确定索引是否是交互式的?
我应该在交互式索引下对这个项目进行分类吗?

I've also read that it's a choice between batch indexing or interactive indexing. How would one determine if indexing is interactive, just by frequent updates? Should I categorize this project under interactive indexing then?

更新:我正在添加我的IndexWriter片段:

UPDATE: I'm adding snippet of my IndexWriter:

        writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), IndexWriter.MaxFieldLength.UNLIMITED);

似乎maxMerge(?或字段长度...)已经设置为无限制。

Seems like maxMerge (? or field length...) is already set to unlimited.

推荐答案

我已经使用了ulimit,但错误仍然显示。
然后我检查了用于lucene功能的定制核心适配器。
原来有太多的IndexWriter.open目录是LEFT OPEN。

I already used the ulimit but error still shows. Then I inspected the customized core adapters for lucene functions. Turns out there's too many IndexWriter.open directory that is LEFT OPEN.

应该注意,在处理之后,总是会在关闭打开的目录时调用。

Should note that after processing, will always call on closing the directory opened.

这篇关于Lucene上打开文件太多错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆