斯坦福核心 NLP - 理解共指解析 [英] Stanford Core NLP - understanding coreference resolution
问题描述
我在理解最新版本的斯坦福 NLP 工具中对 coref 解析器所做的更改时遇到了一些麻烦.举个例子,下面是一个句子和对应的CorefChainAnnotation:
I'm having some trouble understanding the changes made to the coref resolver in the last version of the Stanford NLP tools. As an example, below is a sentence and the corresponding CorefChainAnnotation:
The atom is a basic unit of matter, it consists of a dense central nucleus surrounded by a cloud of negatively charged electrons.
{1=[1 1, 1 2], 5=[1 3], 7=[1 4], 9=[1 5]}
我不确定我是否理解这些数字的含义.查看源代码也无济于事.
I am not sure I understand the meaning of these numbers. Looking at the source doesn't really help either.
谢谢
推荐答案
第一个数字是集群id(代表tokens,代表同一个实体),见SieveCoreferenceSystem#coref(Document)的源码代码>.对数超出 CorefChain#toString():
The first number is a cluster id (representing tokens, which stand for the same entity), see source code of SieveCoreferenceSystem#coref(Document)
. The pair numbers are outout of CorefChain#toString():
public String toString(){
return position.toString();
}
其中 position 是一组实体提及的位置对(使用 CorefChain.getCorefMentions()
获取它们).下面是一个完整代码的例子(在 groovy 中),它展示了如何从位置到令牌:
where position is a set of postion pairs of entity mentioning (to get them use CorefChain.getCorefMentions()
). Here is an example of a complete code (in groovy), which shows how to get from positions to tokens:
class Example {
public static void main(String[] args) {
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
props.put("dcoref.score", true);
pipeline = new StanfordCoreNLP(props);
Annotation document = new Annotation("The atom is a basic unit of matter, it consists of a dense central nucleus surrounded by a cloud of negatively charged electrons.");
pipeline.annotate(document);
Map<Integer, CorefChain> graph = document.get(CorefChainAnnotation.class);
println aText
for(Map.Entry<Integer, CorefChain> entry : graph) {
CorefChain c = entry.getValue();
println "ClusterId: " + entry.getKey();
CorefMention cm = c.getRepresentativeMention();
println "Representative Mention: " + aText.subSequence(cm.startIndex, cm.endIndex);
List<CorefMention> cms = c.getCorefMentions();
println "Mentions: ";
cms.each { it ->
print aText.subSequence(it.startIndex, it.endIndex) + "|";
}
}
}
}
输出(我不明白's'来自哪里):
Output (I do not understand where 's' comes from):
The atom is a basic unit of matter, it consists of a dense central nucleus surrounded by a cloud of negatively charged electrons.
ClusterId: 1
Representative Mention: he
Mentions: he|atom |s|
ClusterId: 6
Representative Mention: basic unit
Mentions: basic unit |
ClusterId: 8
Representative Mention: unit
Mentions: unit |
ClusterId: 10
Representative Mention: it
Mentions: it |
这篇关于斯坦福核心 NLP - 理解共指解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!