使用Stanford CoreNLP进行懒惰解析，以获得特定句子的情感 [英] Lazy parsing with Stanford CoreNLP to get sentiment only of specific sentences

查看：154 发布时间：2018/12/29 21:16:51 java performance parsing stanford-nlp sentiment-analysis

本文介绍了使用Stanford CoreNLP进行懒惰解析，以获得特定句子的情感的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找优化斯坦福CoreNLP情绪管道性能的方法。因此，想要获得句子的情感，但只想包含作为输入的特定关键词。

我尝试了两种方法：

方法1：StanfordCoreNLP管道用情绪注释整个文本

我已经定义了一个管道注释器：tokenize，ssplit，parse，sentiment。我在整篇文章中运行它，然后在每个句子中查找关键字，如果它们存在，则运行返回关键字值的方法。虽然处理需要几秒钟，但我并不满意。

I have defined a pipeline of annotators: tokenize, ssplit, parse, sentiment. I have run it on entire article, then looked for keywords in each sentence and, if they were present, run a method returning keyword value. I was not satisfied though that processing takes a couple of seconds.

这是代码：

List<String> keywords = ...;
String text = ...;
Map<Integer,Integer> sentenceSentiment = new HashMap<>();

Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, parse, sentiment");
props.setProperty("parse.maxlen", "20");
props.setProperty("tokenize.options", "untokenizable=noneDelete");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

Annotation annotation = pipeline.process(text); // takes 2 seconds!!!!
List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
for (int i=0; i<sentences.size(); i++) {
    CoreMap sentence = sentences.get(i);
    if(sentenceContainsKeywords(sentence,keywords) {
        int sentiment = RNNCoreAnnotations.getPredictedClass(sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree.class));
        sentenceSentiment.put(sentence,sentiment);
    }
}

方法2：StanfordCoreNLP管道注释整个文本句子，在感兴趣的句子上运行单独的注释器

由于第一个解决方案的性能较弱，我已经定义了第二个解决方案。我已经定义了一个管道使用注释器：tokenize，ssplit。我在每个句子中查找关键字，如果它们存在，我只为这个句子创建了一个注释并在其上运行下一个注释器：ParserAnnotator，BinarizerAnnotator和SentimentAnnotator。

Because of the weak performance of the first solution, I have defined the second solution. I have defined a pipeline with annotators: tokenize, ssplit. I looked for keywords in each sentence and, if they were present, I have created an annotation only for this sentence and run next annotators on it: ParserAnnotator, BinarizerAnnotator and SentimentAnnotator.

由于ParserAnnotator，结果真的不令人满意。即使我用相同的属性初始化它。有时它花费的时间比e更多方法1中的文档运行ntire管道。

The results were really unsatisfying because of ParserAnnotator. Even if I initialized it with the same properties. Sometimes it took even more time than entire pipeline run on a document in Approach 1.

List<String> keywords = ...;
String text = ...;
Map<Integer,Integer> sentenceSentiment = new HashMap<>();

Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit"); // parsing, sentiment removed
props.setProperty("parse.maxlen", "20");
props.setProperty("tokenize.options", "untokenizable=noneDelete");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

// initiation of annotators to be run on sentences
ParserAnnotator parserAnnotator = new ParserAnnotator("pa", props);
BinarizerAnnotator  binarizerAnnotator = new BinarizerAnnotator("ba", props);
SentimentAnnotator sentimentAnnotator = new SentimentAnnotator("sa", props);

Annotation annotation = pipeline.process(text); // takes <100 ms
List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
for (int i=0; i<sentences.size(); i++) {
    CoreMap sentence = sentences.get(i);
    if(sentenceContainsKeywords(sentence,keywords) {
        // code required to perform annotation on one sentence
        List<CoreMap> listWithSentence = new ArrayList<CoreMap>();
        listWithSentence.add(sentence);
        Annotation sentenceAnnotation  = new Annotation(listWithSentence);

        parserAnnotator.annotate(sentenceAnnotation); // takes 50 ms up to 2 seconds!!!!
        binarizerAnnotator.annotate(sentenceAnnotation);
        sentimentAnnotator.annotate(sentenceAnnotation);
        sentence = sentenceAnnotation.get(CoreAnnotations.SentencesAnnotation.class).get(0);

        int sentiment = RNNCoreAnnotations.getPredictedClass(sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree.class));
        sentenceSentiment.put(sentence,sentiment);
    }
}

问题

我想知道为什么在CoreNLP中进行解析不是懒惰？（在我的例子中，这意味着：只有在调用句子的情绪时才会执行）。这是出于性能原因吗？

I wonder why parsing in CoreNLP is not "lazy"? (In my example that would mean: performed only when sentiment on a sentence is called). Is it from performance reasons?

为什么一个句子的解析器几乎和整个文章的解析器一样长（我的文章有7个句子）？是否可以以更快的方式配置它？

How come a parser for one sentence can work almost as long as a parser for entire article (my article had 7 sentences)? Is it possible to configure it in a way that it works faster?

使用Stanford CoreNLP进行懒惰解析，以获得特定句子的情感 [英] Lazy parsing with Stanford CoreNLP to get sentiment only of specific sentences

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

使用Stanford CoreNLP进行懒惰解析，以获得特定句子的情感 [英] Lazy parsing with Stanford CoreNLP to get sentiment only of specific sentences

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭