Stanford NLP-处理文件列表时OpenIE内存不足 [英] Stanford NLP - OpenIE out of memory when processing list of files

查看:161
本文介绍了Stanford NLP-处理文件列表时OpenIE内存不足的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正尝试使用Stanford CoreNLP的OpenIE工具从多个文件中提取信息,当将多个文件(而不只是一个)传递给输入时,它会出现内存不足的错误.

I'm trying to extract information from several files using the OpenIE tool from Stanford CoreNLP, it gives an out of memory error when several files are passed to the input, instead of just one.

All files have been queued; awaiting termination...
java.lang.OutOfMemoryError: GC overhead limit exceeded
at edu.stanford.nlp.graph.DirectedMultiGraph.outgoingEdgeIterator(DirectedMultiGraph.java:508)
at edu.stanford.nlp.semgraph.SemanticGraph.outgoingEdgeIterator(SemanticGraph.java:165)
at edu.stanford.nlp.semgraph.semgrex.GraphRelation$GOVERNER$1.advance(GraphRelation.java:267)
at edu.stanford.nlp.semgraph.semgrex.GraphRelation$SearchNodeIterator.initialize(GraphRelation.java:1102)
at edu.stanford.nlp.semgraph.semgrex.GraphRelation$SearchNodeIterator.<init>(GraphRelation.java:1083)
at edu.stanford.nlp.semgraph.semgrex.GraphRelation$GOVERNER$1.<init>(GraphRelation.java:257)
at edu.stanford.nlp.semgraph.semgrex.GraphRelation$GOVERNER.searchNodeIterator(GraphRelation.java:257)
at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.resetChildIter(NodePattern.java:320)
at edu.stanford.nlp.semgraph.semgrex.CoordinationPattern$CoordinationMatcher.matches(CoordinationPattern.java:211)
at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.matchChild(NodePattern.java:514)
at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.matches(NodePattern.java:542)
at edu.stanford.nlp.naturalli.RelationTripleSegmenter.segmentVerb(RelationTripleSegmenter.java:541)
at edu.stanford.nlp.naturalli.RelationTripleSegmenter.segment(RelationTripleSegmenter.java:850)
at edu.stanford.nlp.naturalli.OpenIE.relationInFragment(OpenIE.java:354)
at edu.stanford.nlp.naturalli.OpenIE.lambda$relationsInFragments$2(OpenIE.java:366)
at edu.stanford.nlp.naturalli.OpenIE$$Lambda$76/1438896944.apply(Unknown Source)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1540)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at edu.stanford.nlp.naturalli.OpenIE.relationsInFragments(OpenIE.java:366)
at edu.stanford.nlp.naturalli.OpenIE.annotateSentence(OpenIE.java:486)
at edu.stanford.nlp.naturalli.OpenIE.lambda$annotate$3(OpenIE.java:554)
at edu.stanford.nlp.naturalli.OpenIE$$Lambda$25/606198361.accept(Unknown Source)
at java.util.ArrayList.forEach(ArrayList.java:1249)
at edu.stanford.nlp.naturalli.OpenIE.annotate(OpenIE.java:554)
at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:71)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:499)
at edu.stanford.nlp.naturalli.OpenIE.processDocument(OpenIE.java:630)
DONE processing files. 1 exceptions encountered.

我使用此调用通过输入传递文件:

I pass the files by input using this call:

java -mx3g -cp stanford-corenlp-3.6.0.jar:stanford-corenlp-3.6.0-models.jar:CoreNLP-to-HTML.xsl:slf4j-api.jar:slf4j-simple.jar edu.stanford.nlp.naturalli.OpenIE file1 file2 file3 etc.

我尝试使用-mx3g和其他变体来增加内存,尽管处理的文件数量增加了,但数量并不多(例如,从5到7).每个文件都经过正确处理,因此我排除了包含大句子或多行的文件.

I tried increasing the memory with -mx3g and other variants, and although the amount of processed files increases, it's not much (from 5 to 7, for eg.). Each file individually is processed correctly, so I'm excluding a file with big sentences or many lines.

是否有我没有考虑的选项,一些OpenIE或Java标志,可以用来强制转储到输出,清理或处理的每个文件之间的垃圾回收的东西?

Is there an option I'm not considering, some OpenIE or Java flag, something that I can use to force a dump to an output, a cleaning, or garbage collection between each file that is processed?

提前谢谢

推荐答案

从上面的评论中:我怀疑这是并行性过多和内存不足的问题. OpenIE会占用一些内存,尤其是长句子时,因此并行运行许多文件会占用相当多的内存.

From the comments above: I suspect this is an issue with too much parallelism and too little memory. OpenIE is a bit memory hungry, especially with long sentences, and so running many files in parallel can take up a fair bit of memory.

一个简单的解决方法是通过设置-threads 1标志来强制程序运行单线程.如果可能的话,增加内存也会有所帮助.

An easy fix is to force the program to run single-threaded, by setting the -threads 1 flag. If possible, increasing memory should help as well.

这篇关于Stanford NLP-处理文件列表时OpenIE内存不足的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆