stanford core nlp java输出 [英] stanford core nlp java output

查看:124
本文介绍了stanford core nlp java输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Java和Stanford NLP工具包的新手,并尝试将它们用于项目。具体来说,我正在尝试使用Stanford Corenlp工具包来注释文本(使用Netbeans而不是命令行),我尝试使用 http://nlp.stanford.edu/software/corenlp.shtml#Usage (使用Stanford CoreNLP API)..问题是:任何人都可以告诉我我是怎么做的可以在文件中获取输出,以便我可以进一步处理它吗?

I'm a newbie with Java and Stanford NLP toolkit and trying to use them for a project. Specifically, I'm trying to use Stanford Corenlp toolkit to annotate a text (with Netbeans and not command line) and I tried to use the code provided on http://nlp.stanford.edu/software/corenlp.shtml#Usage (Using the Stanford CoreNLP API).. question is: can anybody tell me how I can get the output in a file so that I can further process it?

我已经尝试将图形和句子打印到控制台,只是为了查看内容。这样可行。基本上我需要的是返回带注释的文档,这样我就可以从我的主类中调用它并输出一个文本文件(如果可能的话)。我正在尝试查看stanford corenlp的API,但由于缺乏经验,我不知道返回此类信息的最佳方法是什么。

I've tried printing the graphs and the sentence to the console, just to see the content. That works. Basically what I'd need is to return the annotated document, so that I can call it from my main class and output a text file (if that's possible). I'm trying to look in the API of stanford corenlp, but I don't really know what is the best way to return such kind of information, given my lack of experience.

以下是代码:

Properties props = new Properties();
    props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    // read some text in the text variable
    String text = "the quick fox jumps over the lazy dog";

    // create an empty Annotation just with the given text
    Annotation document = new Annotation(text);

    // run all Annotators on this text
    pipeline.annotate(document);

    // these are all the sentences in this document
    // a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
    List<CoreMap> sentences = document.get(SentencesAnnotation.class);

    for(CoreMap sentence: sentences) {
      // traversing the words in the current sentence
      // a CoreLabel is a CoreMap with additional token-specific methods
      for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
        // this is the text of the token
        String word = token.get(TextAnnotation.class);
        // this is the POS tag of the token
        String pos = token.get(PartOfSpeechAnnotation.class);
        // this is the NER label of the token
        String ne = token.get(NamedEntityTagAnnotation.class);       
      }

      // this is the parse tree of the current sentence
      Tree tree = sentence.get(TreeAnnotation.class);

      // this is the Stanford dependency graph of the current sentence
      SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
    }

    // This is the coreference link graph
    // Each chain stores a set of mentions that link to each other,
    // along with a method for getting the most representative mention
    // Both sentence and token offsets start at 1!
    Map<Integer, CorefChain> graph = 
      document.get(CorefChainAnnotation.class);


推荐答案

完成任何或所有自然语言分析后在您的代码示例中显示,您需要做的就是以普通的Java方式将它们发送到文件,例如,使用FileWriter进行文本格式输出。具体来说,这是一个简单的完整示例,显示发送到文件的输出(如果您给它适当的命令行参数):

Once you have any or all of the natural language analyses shown in your code example, all you need to do is send them to a file in the normal Java fashion, e.g., with a FileWriter for text format output. Concretely, here's a simple complete example that shows output sent to files (if you give it appropriate command-line arguments):

import java.io.*;
import java.util.*;

import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.util.*;

public class StanfordCoreNlpDemo {

  public static void main(String[] args) throws IOException {
    PrintWriter out;
    if (args.length > 1) {
      out = new PrintWriter(args[1]);
    } else {
      out = new PrintWriter(System.out);
    }
    PrintWriter xmlOut = null;
    if (args.length > 2) {
      xmlOut = new PrintWriter(args[2]);
    }

    StanfordCoreNLP pipeline = new StanfordCoreNLP();
    Annotation annotation;
    if (args.length > 0) {
      annotation = new Annotation(IOUtils.slurpFileNoExceptions(args[0]));
    } else {
      annotation = new Annotation("Kosgi Santosh sent an email to Stanford University. He didn't get a reply.");
    }

    pipeline.annotate(annotation);
    pipeline.prettyPrint(annotation, out);
    if (xmlOut != null) {
      pipeline.xmlPrint(annotation, xmlOut);
    }
    // An Annotation is a Map and you can get and use the various analyses individually.
    // For instance, this gets the parse tree of the first sentence in the text.
    List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
    if (sentences != null && sentences.size() > 0) {
      CoreMap sentence = sentences.get(0);
      Tree tree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
      out.println();
      out.println("The first sentence parsed is:");
      tree.pennPrint(out);
    }
  }

}

这篇关于stanford core nlp java输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆