是否有可能获得一组包含短语的特定命名实体令牌 [英] Is it possible to get a set of a specific named entity tokens that comprise a phrase

查看：97 发布时间：2020/8/6 3:08:37 stanford-nlp named-entity-recognition

本文介绍了是否有可能获得一组包含短语的特定命名实体令牌的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Stanford CoreNLP解析器来遍历某些文本，并且有一些日期短语，例如"10月的第二个星期一"和过去的一年".该库会将每个标记适当地标记为DATE命名实体，但是有没有办法以编程方式获取整个日期短语?这不仅仅是日期，组织命名的实体也将这样做(例如，国际奥委会"可以在给定的文本示例中标识出来).

I'm using the Stanford CoreNLP parsers to run through some text and there are date phrases, such as 'the second Monday in October' and 'the past year'. The library will appropriately tag each token as a DATE named entity, but is there a way to programmatically get this whole date phrase? And it's not just dates, ORGANIZATION named entities will do the same ("The International Olympic Committee", for example, could be one identified in a given text example).

String content = "Thanksgiving, or Thanksgiving Day (Canadian French: Jour de"
        + " l'Action de grâce), occurring on the second Monday in October, is"
        + " an annual Canadian holiday which celebrates the harvest and other"
        + " blessings of the past year.";

Properties p = new Properties();
p.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse");
StanfordCoreNLP pipeline = new StanfordCoreNLP(p);

Annotation document = new Annotation(content);
pipeline.annotate(document);

for (CoreMap sentence : document.get(CoreAnnotations.SentencesAnnotation.class)) {
    for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {

        String word = token.get(CoreAnnotations.TextAnnotation.class);
        String ne = token.get(CoreAnnotations.NamedEntityTagAnnotation.class);

        if (ne.equals("DATE")) {
            System.out.println("DATE: " + word);
        }

    }
}

在加载斯坦福注释器和分类器之后，将产生以下输出:

Which, after the Stanford annotator and classifier loading, will yield the output:

DATE: Thanksgiving
DATE: Thanksgiving
DATE: the
DATE: second
DATE: Monday
DATE: in
DATE: October
DATE: the
DATE: past
DATE: year

我觉得图书馆必须识别出这些短语并将其用于命名的实体标签，所以问题在于是否可以通过api保留并以某种方式获得数据?

I feel like the library has to be recognizing the phrases and using them for the named entity tagging, so the question would be is that data kept and available somehow through the api?

谢谢，凯文

是否有可能获得一组包含短语的特定命名实体令牌 [英] Is it possible to get a set of a specific named entity tokens that comprise a phrase

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

是否有可能获得一组包含短语的特定命名实体令牌 [英] Is it possible to get a set of a specific named entity tokens that comprise a phrase

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭