从 Penn Treebank 格式的文本中提取子句 [英] Extracting clause from a Penn Treebank-formatted text

查看:62
本文介绍了从 Penn Treebank 格式的文本中提取子句的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一句话:

After he had eaten the cheese, Bill went to the grocery.

在我的程序中,我得到以下输出:

In my program, I get the following output:

---PARSE TREE---
(ROOT
  (S
    (SBAR (IN After)
      (S
        (NP (PRP he))
        (VP (VBD had)
          (VP (VBN eaten)
            (NP (DT the) (NN cheese))))))
    (, ,)
    (NP (NNP Bill))
    (VP (VBD went)
      (PP (TO to)
        (NP (DT the) (NN grocery))))
    (. .)))

如何将不在子句中的内容合并为独立子句?像这样:

How would I merge the stuff not within a clause to become an independent clause? Like this:

S Clause {
    SBAR Clause {
         After he had eaten the cheese,
    }

    S Clause {
        Bill went to the grocery.
    }
}

我很确定我不清楚,但基本上我想提取句子的独立和从属子句,以及这些子句的子句.

I'm pretty sure that I'm not clear, but basically I want to extract the independent and dependent clauses of the sentence, and the subclauses of those clauses.

推荐答案

这里是 NLTK 指南中的演示代码(它没有明确显示如何提取子句):http://nltk.googlecode.com/svn/trunk/doc/howto/tree.html

Here is a demonstration code from the NLTK guide (It doesn't explicitly show how to extract a clause): http://nltk.googlecode.com/svn/trunk/doc/howto/tree.html

这篇关于从 Penn Treebank 格式的文本中提取子句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆