Lucene 上打开的文件太多错误 [英] Too many open files Error on Lucene

查看:23
本文介绍了Lucene 上打开的文件太多错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在进行的项目是索引一定数量的数据(带有长文本)并将它们与每个间隔(大约 15 到 30 分钟)的单词列表进行比较.

The project I'm working on is indexing a certain number of data (with long texts) and comparing them with list of words per interval (about 15 to 30 minutes).

一段时间后,比如第 35 轮,在第 36 轮开始索引新数据集时出现此错误:

After some time, say 35th round, while starting to index new set of data on 36th round this error occurred:

    [ERROR] (2011-06-01 10:08:59,169) org.demo.service.LuceneService.countDocsInIndex(?:?) : Exception on countDocsInIndex: 
    java.io.FileNotFoundException: /usr/share/demo/index/tag/data/_z.tvd (Too many open files)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
        at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.<init>(SimpleFSDirectory.java:69)
        at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.<init>(SimpleFSDirectory.java:90)
        at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.<init>(NIOFSDirectory.java:91)
        at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78)
        at org.apache.lucene.index.TermVectorsReader.<init>(TermVectorsReader.java:81)
        at org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:299)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:580)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:556)
        at org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:113)
        at org.apache.lucene.index.ReadOnlyDirectoryReader.<init>(ReadOnlyDirectoryReader.java:29)
        at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:81)
        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:736)
        at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:428)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:274)
        at org.demo.service.LuceneService.countDocsInIndex(Unknown Source)
        at org.demo.processing.worker.DataFilterWorker.indexTweets(Unknown Source)
        at org.demo.processing.worker.DataFilterWorker.processTweets(Unknown Source)
        at org.demo.processing.worker.DataFilterWorker.run(Unknown Source)
        at java.lang.Thread.run(Thread.java:636)

我已经尝试通过以下方式设置最大打开文件数:

I've already tried setting maximum number of open files by:

        ulimit -n <number>

但是一段时间后,当区间有大约 1050 行长文本时,会出现同样的错误.但它只发生过一次.

But after some time, when the interval has about 1050 rows of long texts, the same error occurs. But it only occurred once.

我是否应该按照 (打开的文件太多)- SOLR 或者这是索引数据量的问题?

Should I follow the advice of modifying Lucene IndexWriter's mergeFactor from (Too many open files) - SOLR or is this an issue on the amount of data being indexed?

我还读到它是批量索引或交互式索引之间的选择.仅通过频繁更新,如何确定索引是否是交互式的?那我应该把这个项目归类到交互式索引下吗?

I've also read that it's a choice between batch indexing or interactive indexing. How would one determine if indexing is interactive, just by frequent updates? Should I categorize this project under interactive indexing then?

更新:我正在添加我的 IndexWriter 的片段:

UPDATE: I'm adding snippet of my IndexWriter:

        writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), IndexWriter.MaxFieldLength.UNLIMITED);

似乎 maxMerge(?或字段长度...)已设置为无限制.

Seems like maxMerge (? or field length...) is already set to unlimited.

推荐答案

我已经使用了 ulimit 但仍然显示错误.然后我检查了 lucene 功能的定制核心适配器.原来有太多的 IndexWriter.open 目录是 LEFT OPEN.

I already used the ulimit but error still shows. Then I inspected the customized core adapters for lucene functions. Turns out there's too many IndexWriter.open directory that is LEFT OPEN.

需要注意的是,处理后,会一直调用关闭打开的目录.

Should note that after processing, will always call on closing the directory opened.

这篇关于Lucene 上打开的文件太多错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆