CoreNLP在Apache Spark上 [英] CoreNLP on Apache Spark

查看:168
本文介绍了CoreNLP在Apache Spark上的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不知道这是否与Spark或NLP相关。请帮助。我目前试图在Apache Spark上运行Stanford CoreNLP库,当我尝试在多个内核上运行它时,会出现以下异常。我使用的是线程安全的最新的NLP库。

I'm not sure if this is related to Spark or NLP. Please help.I'm currently trying to run Stanford CoreNLP Library on Apache Spark and when I try to run it on multiple cores, I get the following exception. I'm using the latest NLP Library which is thread safe.

这是发生在地图阶段在线。

This is happening during the map phase on line.

 pipeline.annotate(document);

java.util.ConcurrentModificationException

at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
    at java.util.ArrayList$Itr.next(ArrayList.java:851)
    at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1042)
    at edu.stanford.nlp.trees.GrammaticalStructure.analyzeNode(GrammaticalStructure.java:463)
    at edu.stanford.nlp.trees.GrammaticalStructure.analyzeNode(GrammaticalStructure.java:488)
    at edu.stanford.nlp.trees.GrammaticalStructure.analyzeNode(GrammaticalStructure.java:488)
    at edu.stanford.nlp.trees.GrammaticalStructure.analyzeNode(GrammaticalStructure.java:488)
    at edu.stanford.nlp.trees.GrammaticalStructure.analyzeNode(GrammaticalStructure.java:488)
    at edu.stanford.nlp.trees.GrammaticalStructure.analyzeNode(GrammaticalStructure.java:488)
    at edu.stanford.nlp.trees.GrammaticalStructure.<init>(GrammaticalStructure.java:201)
    at edu.stanford.nlp.trees.EnglishGrammaticalStructure.<init>(EnglishGrammaticalStructure.java:89)
    at edu.stanford.nlp.semgraph.SemanticGraphFactory.makeFromTree(SemanticGraphFactory.java:139)
    at edu.stanford.nlp.pipeline.DeterministicCorefAnnotator.annotate(DeterministicCorefAnnotator.java:89)
    at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:68)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:412)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.process(StanfordCoreNLP.java:441)
    at sampleApp.WordProcessor$2.call(WordProcessor.java:69)
    at sampleApp.WordProcessor$2.call(WordProcessor.java:1)


推荐答案

我认为这是一个CoreNLP问题。

I think it is a CoreNLP issue.

另请参见使用Stanford CoreNLP(3.5.2)的并发处理

我有同样的问题,使用从最新的github修订版(今天)解决了问题。总而言之,CoreNLP 3.5.2中有一个错误,他们解决了。

I had the same problem and using a build from the latest github revision (today) solved the problem. In summary think there was a bug in CoreNLP 3.5.2 and they solved it.

这篇关于CoreNLP在Apache Spark上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆