Stanford Core NLP - 了解共同参与解决方案 [英] Stanford Core NLP - understanding coreference resolution

查看：100 发布时间：2018/12/5 9:44:24 java nlp stanford-nlp

本文介绍了Stanford Core NLP - 了解共同参与解决方案的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在理解上一版斯坦福NLP工具中对coref解析器所做的更改时遇到了一些麻烦。
例如，下面是一个句子和相应的CorefChainAnnotation：

I'm having some trouble understanding the changes made to the coref resolver in the last version of the Stanford NLP tools. As an example, below is a sentence and the corresponding CorefChainAnnotation:

The atom is a basic unit of matter, it consists of a dense central nucleus surrounded by a cloud of negatively charged electrons.

{1=[1 1, 1 2], 5=[1 3], 7=[1 4], 9=[1 5]}

我不确定我理解这些数字的含义。查看源代码也没有任何帮助。

I am not sure I understand the meaning of these numbers. Looking at the source doesn't really help either.

谢谢

推荐答案

第一个数字是一个集群ID（代表标记，代表同一个实体），参见 SieveCoreferenceSystem＃coref（Document）的源代码。对数不包括CorefChain＃toString（）：

The first number is a cluster id (representing tokens, which stand for the same entity), see source code of SieveCoreferenceSystem#coref(Document). The pair numbers are outout of CorefChain#toString():

public String toString(){
    return position.toString();
}

其中，position是一组提及实体的位置对（让他们使用） CorefChain.getCorefMentions（））。以下是完整代码的示例（在 groovy 中），其中显示了如何从位置到令牌：

where position is a set of postion pairs of entity mentioning (to get them use CorefChain.getCorefMentions()). Here is an example of a complete code (in groovy), which shows how to get from positions to tokens:

class Example {
    public static void main(String[] args) {
        Properties props = new Properties();
        props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
        props.put("dcoref.score", true);
        pipeline = new StanfordCoreNLP(props);
        Annotation document = new Annotation("The atom is a basic unit of matter, it   consists of a dense central nucleus surrounded by a cloud of negatively charged electrons.");

        pipeline.annotate(document);
        Map<Integer, CorefChain> graph = document.get(CorefChainAnnotation.class);

        println aText

        for(Map.Entry<Integer, CorefChain> entry : graph) {
          CorefChain c =   entry.getValue();                
          println "ClusterId: " + entry.getKey();
          CorefMention cm = c.getRepresentativeMention();
          println "Representative Mention: " + aText.subSequence(cm.startIndex, cm.endIndex);

          List<CorefMention> cms = c.getCorefMentions();
          println  "Mentions:  ";
          cms.each { it -> 
              print aText.subSequence(it.startIndex, it.endIndex) + "|"; 
          }         
        }
    }
}

输出（我不明白'''来自哪里）：

Output (I do not understand where 's' comes from):

The atom is a basic unit of matter, it consists of a dense central nucleus surrounded by a cloud of negatively charged electrons.
ClusterId: 1
Representative Mention: he
Mentions: he|atom |s|
ClusterId: 6
Representative Mention:  basic unit 
Mentions:  basic unit |
ClusterId: 8
Representative Mention:  unit 
Mentions:  unit |
ClusterId: 10
Representative Mention: it 
Mentions: it |

这篇关于Stanford Core NLP - 了解共同参与解决方案的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Stanford Core NLP - 了解共同参与解决方案 [英] Stanford Core NLP - understanding coreference resolution

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Stanford Core NLP - 了解共同参与解决方案 [英] Stanford Core NLP - understanding coreference resolution

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭