斯坦福大学NLP命名具有多个令牌的实体 [英] Stanford NLP named entities of more than one token
问题描述
我正在尝试使用Stanford Core NLP进行命名实体识别.
I'm experimenting with Stanford Core NLP for named entity recognition.
某些命名实体包含多个令牌,例如Person:"Bill Smith".我无法弄清楚用什么API调用来确定"Bill"和"Smith"何时应被视为单个实体,以及何时应将其视为两个不同的实体.
Some named entities consist of more than one token, for example, Person: "Bill Smith". I can't figure out what API calls to use to determine when "Bill" and "Smith" should be considered a single entity, and when they should be two different entities.
在某处有一些不错的文档来解释这一点吗?
Is there some decent documentation somewhere which explains this?
这是我当前的代码:
InputStream is = getClass().getResourceAsStream(MODEL_NAME);
if (MODEL_NAME.endsWith(".gz")) {
is = new GZIPInputStream(is);
}
is = new BufferedInputStream(is);
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
AbstractSequenceClassifier<CoreLabel> classifier = CRFClassifier.getClassifier(is);
is.close();
String text = "Hello, Bill Smith, how are you?";
List<List<CoreLabel>> sentences = classifier.classify(text);
for (List<CoreLabel> sentence: sentences) {
for (CoreLabel word: sentence) {
String type = word.get(CoreAnnotations.AnswerAnnotation.class);
System.out.println(word + " is of type " + type);
}
}
此外,我不清楚为什么"PERSON"注释会以AnswerAnnotation的形式返回,而不是CoreAnnotations.EntityClassAnnotation,EntityTypeAnnotation或其他形式.
Also, it isn't clear to me why the "PERSON" annotation is coming back as AnswerAnnotation, instead of CoreAnnotations.EntityClassAnnotation, EntityTypeAnnotation, or something else.
推荐答案
您应该使用"entitymentions"注释器,该注释器将标记连续的令牌序列,并使用与实体相同的ner标签.每个句子的实体列表将存储在CoreAnnotations.MentionsAnnotation.class键下.每个提及的实体本身就是一个CoreMap.
You should use the "entitymentions" annotator, which will mark continuous sequences of tokens with the same ner tag as an entity. The list of entities for each sentence will be stored under the CoreAnnotations.MentionsAnnotation.class key. Each entity mention itself will be a CoreMap.
查看此代码可能会有所帮助:
Looking over this code could help:
一些示例代码:
import java.io.*;
import java.util.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.util.*;
public class EntityMentionsExample {
public static void main (String[] args) throws IOException {
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,entitymentions");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String text = "Joe Smith is from Florida.";
Annotation annotation = new Annotation(text);
pipeline.annotate(annotation);
System.out.println("---");
System.out.println("text: " + text);
for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
for (CoreMap entityMention : sentence.get(CoreAnnotations.MentionsAnnotation.class)) {
System.out.print(entityMention.get(CoreAnnotations.TextAnnotation.class));
System.out.print("\t");
System.out.print(
entityMention.get(CoreAnnotations.NamedEntityTagAnnotation.class));
System.out.println();
}
}
}
}
这篇关于斯坦福大学NLP命名具有多个令牌的实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!