斯坦福CoreNLP管道COREF:解析一些短字符串(很少提到)返回indexoutofbounds例外 [英] Stanford CoreNLP pipeline coref: parsing some short strings (with few mentions) returns indexoutofbounds exception

查看:2161
本文介绍了斯坦福CoreNLP管道COREF:解析一些短字符串(很少提到)返回indexoutofbounds例外的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景:我在导入斯坦福CoreNLP库到我的Clojure项目。我使用的版本3.5.1,但最近直接跳进3.6.0版本,3.5.2绕过。作为此更新的一部分,因为我用的是dcoref注释越来越指代的信息,我需要让自己的程序中使用的COREF注释,而不是进行小的修改。

在过去(V3.5.1),当我创建了一个管道具有以下注解

In the past (v3.5.1), when I created a pipeline with the following annotators

记号化,SSPLIT,POS,引理,NER,分析,depparse,dcoref,报价,entitymentions

我可以解析的句子,如没有错误的情况如下:

I could parse a sentence such as the following without error:

我吃面包。

如果我没有记错,提取所产生的注释文档共指链将只返回一个空值,或者空数组。但是,这是无关紧要的,因为至少有注释的文件将不会出现错误被创建。

If I remember correctly, extracting the coreference chains from the resulting annotated document would just return an null value, or maybe an empty array. But that's inconsequential, because at least the annotated document would be created without error.

现在,当我创建一个管道具有以下注解:

Now, when I create a pipeline with the following annotators:

记号化,SSPLIT,POS,引理,NER,分析,depparse,提,COREF,报价,entitymentions

然后我尝试解析同一个句子(或任何其他的句子,只有1或0提及)我得到以下跟踪抛出IndexOutOfBoundsException:

and then I try to parse that same sentence (or any other sentences with only 1 or 0 "mentions") I get an indexoutofboundsexception with the following trace:

actual: java.lang.RuntimeException: Error annotating document with coref
 at edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate (StatisticalCorefSystem.java:79)
    edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate (StatisticalCorefSystem.java:62)
    edu.stanford.nlp.pipeline.CorefAnnotator.annotate (CorefAnnotator.java:100)
    edu.stanford.nlp.pipeline.AnnotationPipeline.annotate (AnnotationPipeline.java:68)
    edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate (StanfordCoreNLP.java:491)
    nlp.core$parse_text.invoke (core.clj:199)
    nlp.focus_scorer.process$lexchain_features.invoke (process.clj:63)
    nlp.focus_scorer.process_test/fn (process_test.clj:49)
    clojure.test$test_var$fn__7670.invoke (test.clj:704)
    clojure.test$test_var.invoke (test.clj:704)
    clojure.test$test_vars$fn__7692$fn__7697.invoke (test.clj:722)
    clojure.test$default_fixture.invoke (test.clj:674)
    clojure.test$test_vars$fn__7692.invoke (test.clj:722)
    clojure.test$default_fixture.invoke (test.clj:674)
    clojure.test$test_vars.invoke (test.clj:718)
    clojure.test$test_all_vars.invoke (test.clj:728)
    clojure.test$test_ns.invoke (test.clj:747)
    clojure.core$map$fn__4553.invoke (core.clj:2624)
    clojure.lang.LazySeq.sval (LazySeq.java:40)
    clojure.lang.LazySeq.seq (LazySeq.java:49)
    clojure.lang.Cons.next (Cons.java:39)
    clojure.lang.RT.boundedLength (RT.java:1735)
    clojure.lang.RestFn.applyTo (RestFn.java:130)
    clojure.core$apply.invoke (core.clj:632)
    clojure.test$run_tests.doInvoke (test.clj:762)
    clojure.lang.RestFn.invoke (RestFn.java:408)
    user$eval13163.invoke (form-init7737210093072696705.clj:1)
    clojure.lang.Compiler.eval (Compiler.java:6782)
    clojure.lang.Compiler.eval (Compiler.java:6745)
    clojure.core$eval.invoke (core.clj:3081)
    clojure.main$repl$read_eval_print__7099$fn__7102.invoke (main.clj:240)
    clojure.main$repl$read_eval_print__7099.invoke (main.clj:240)
    clojure.main$repl$fn__7108.invoke (main.clj:258)
    clojure.main$repl.doInvoke (main.clj:258)
    clojure.lang.RestFn.invoke (RestFn.java:1523)
    clojure.tools.nrepl.middleware.interruptible_eval$evaluate$fn__909.invoke (interruptible_eval.clj:58)
    clojure.lang.AFn.applyToHelper (AFn.java:152)
    clojure.lang.AFn.applyTo (AFn.java:144)
    clojure.core$apply.invoke (core.clj:630)
    clojure.core$with_bindings_STAR_.doInvoke (core.clj:1868)
    clojure.lang.RestFn.invoke (RestFn.java:425)
    clojure.tools.nrepl.middleware.interruptible_eval$evaluate.invoke (interruptible_eval.clj:56)
    clojure.tools.nrepl.middleware.interruptible_eval$interruptible_eval$fn__951$fn__954.invoke (interruptible_eval.clj:191)
    clojure.tools.nrepl.middleware.interruptible_eval$run_next$fn__946.invoke (interruptible_eval.clj:159)
    clojure.lang.AFn.run (AFn.java:22)
    java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1142)
    java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:617)
    java.lang.Thread.run (Thread.java:745)
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList$SubList.rangeCheck (ArrayList.java:1217)
    java.util.ArrayList$SubList.get (ArrayList.java:1034)
    edu.stanford.nlp.scoref.Clusterer$State.setClusters (Clusterer.java:349)
    edu.stanford.nlp.scoref.Clusterer$State.<init> (Clusterer.java:322)
    edu.stanford.nlp.scoref.Clusterer.getClusterMerges (Clusterer.java:58)
    edu.stanford.nlp.scoref.ClusteringCorefSystem.runCoref (ClusteringCorefSystem.java:63)
    edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate (StatisticalCorefSystem.java:68)
    edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate (StatisticalCorefSystem.java:62)
    edu.stanford.nlp.pipeline.CorefAnnotator.annotate (CorefAnnotator.java:100)
    edu.stanford.nlp.pipeline.AnnotationPipeline.annotate (AnnotationPipeline.java:68)
    edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate (StanfordCoreNLP.java:491)
    nlp.core$parse_text.invoke (core.clj:199)
    nlp.focus_scorer.process$lexchain_features.invoke (process.clj:63)
    nlp.focus_scorer.process_test/fn (process_test.clj:49)
    clojure.test$test_var$fn__7670.invoke (test.clj:704)
    clojure.test$test_var.invoke (test.clj:704)
    clojure.test$test_vars$fn__7692$fn__7697.invoke (test.clj:722)
    clojure.test$default_fixture.invoke (test.clj:674)
    clojure.test$test_vars$fn__7692.invoke (test.clj:722)
    clojure.test$default_fixture.invoke (test.clj:674)
    clojure.test$test_vars.invoke (test.clj:718)
    clojure.test$test_all_vars.invoke (test.clj:728)
    clojure.test$test_ns.invoke (test.clj:747)
clojure.core$map$fn__4553.invoke (core.clj:2624)
    clojure.lang.LazySeq.sval (LazySeq.java:40)
    clojure.lang.LazySeq.seq (LazySeq.java:49)
    clojure.lang.Cons.next (Cons.java:39)
    clojure.lang.RT.boundedLength (RT.java:1735)
    clojure.lang.RestFn.applyTo (RestFn.java:130)
    clojure.core$apply.invoke (core.clj:632)
    clojure.test$run_tests.doInvoke (test.clj:762)
    clojure.lang.RestFn.invoke (RestFn.java:408)
    user$eval13163.invoke (form-init7737210093072696705.clj:1)
    clojure.lang.Compiler.eval (Compiler.java:6782)
    clojure.lang.Compiler.eval (Compiler.java:6745)
    clojure.core$eval.invoke (core.clj:3081)
    clojure.main$repl$read_eval_print__7099$fn__7102.invoke (main.clj:240)
    clojure.main$repl$read_eval_print__7099.invoke (main.clj:240)
    clojure.main$repl$fn__7108.invoke (main.clj:258)
    clojure.main$repl.doInvoke (main.clj:258)
    clojure.lang.RestFn.invoke (RestFn.java:1523)
    clojure.tools.nrepl.middleware.interruptible_eval$evaluate$fn__909.invoke (interruptible_eval.clj:58)
    clojure.lang.AFn.applyToHelper (AFn.java:152)
    clojure.lang.AFn.applyTo (AFn.java:144)
    clojure.core$apply.invoke (core.clj:630)
    clojure.core$with_bindings_STAR_.doInvoke (core.clj:1868)
    clojure.lang.RestFn.invoke (RestFn.java:425)
    clojure.tools.nrepl.middleware.interruptible_eval$evaluate.invoke (interruptible_eval.clj:56)
clojure.tools.nrepl.middleware.interruptible_eval$interruptible_eval$fn__951$fn__954.invoke (interruptible_eval.clj:191)
    clojure.tools.nrepl.middleware.interruptible_eval$run_next$fn__946.invoke (interruptible_eval.clj:159)
    clojure.lang.AFn.run (AFn.java:22)
    java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1142)
    java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:617)
    java.lang.Thread.run (Thread.java:745)

难道我可能做错了什么?我意识到,我使用的Clojure而不是Java的事实,可能会导致一些问题,但我从来没有一个问题,3.5.1版本。这似乎是错误正在从edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate注释一步抛出,但我不知道我能做些什么有关(除有两个管道对象,一个用COREF注释和一个没有,分析句子没有COREF,算上提到,然后用COREF解析只有当我看到一个以上的提...这似乎有点太多了。)

Am I possibly doing something wrong? I realize that the fact that I'm using clojure instead of java might be causing some issue, but I've never had a problem with version 3.5.1. It would seem that the error is being thrown from the annotation step in edu.stanford.nlp.scoref.StatisticalCorefSystem.annotate, but I'm not sure what I can do about that (other than to have two pipeline objects, one with the coref annotator and one without, parse the sentence without coref, count the mentions, and then parse with coref only if I see more than one mention... which seems a little too much.)

推荐答案

3.6.0功能主要以指代的变化。这个问题是在斯坦福CoreNLP 3.6.0的错误。如果您重新下载分发这个错误应该固定在现在的达在网站上。还应该固定在了崭露头角的Maven的版本。

3.6.0 features major changes to coreference. This issue is a bug in Stanford CoreNLP 3.6.0. If you re-download the distribution this bug should be fixed in what's up on the site now. It should also be fixed in the up-coming Maven release.

这篇关于斯坦福CoreNLP管道COREF:解析一些短字符串(很少提到)返回indexoutofbounds例外的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆