如何获得带括号的格式(没有POS标签)的解析? [英] How to get a parse in a bracketed format (without POS tags)?

查看:89
本文介绍了如何获得带括号的格式(没有POS标签)的解析?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将句子解析为这种形式的二进制解析( SNLI 语料库):

I want to parse a sentence to a binary parse of this form (Format used in the SNLI corpus):

句子:一个人骑着马跳过一架故障飞机."

sentence:"A person on a horse jumps over a broken down airplane."

parse:(((一个人)(在(一匹马)上))((跳跃(越过(a(破(下降的飞机))))))

parse: ( ( ( A person ) ( on ( a horse ) ) ) ( ( jumps ( over ( a ( broken ( down airplane ) ) ) ) ) . ) )

我找不到执行此操作的解析器.

I'm unable to find a parser which does this.

注意:早先有人问过这个问题(如何获取Python中的二进制解析).但是答案并没有帮助.我无法发表评论,因为我没有所需的声誉.

note: This question has been asked earlier(How to get a binary parse in Python). But the answers are not helpful. And I was unable to comment because I do not have the required reputation.

推荐答案

下面是一些示例代码,这些示例代码将删除树中每个节点的标签.

Here is some sample code which will erase the labels for each node in the tree.

package edu.stanford.nlp.examples;

import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.util.*;

import java.util.*;

public class PrintTreeWithoutLabelsExample {

  public static void main(String[] args) {
    // set up pipeline properties
    Properties props = new Properties();
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,parse");
    // use faster shift reduce parser
    props.setProperty("parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz");
    props.setProperty("parse.maxlen", "100");
    props.setProperty("parse.binaryTrees", "true");
    // set up Stanford CoreNLP pipeline
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    // build annotation for text
    Annotation annotation = new Annotation("The red car drove on the highway.");
    // annotate the review
    pipeline.annotate(annotation);
    for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
      Tree sentenceConstituencyParse = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
      for (Tree subTree : sentenceConstituencyParse.subTrees()) {
        if (!subTree.isLeaf())
          subTree.setLabel(CoreLabel.wordFromString(""));
      }
      TreePrint treePrint = new TreePrint("oneline");
      treePrint.printTree(sentenceConstituencyParse);
    }
  }
}

这篇关于如何获得带括号的格式(没有POS标签)的解析?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆