如何读取基于选区的解析树 [英] How to read constituency based parse tree

查看:279
本文介绍了如何读取基于选区的解析树的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个句子集,这些句子是由Stanford的 CoreNLP 系统预处理的.它提供的功能之一是句子的分析树"(基于选区).虽然我可以理解绘制后的解析树(就像树一样),但是我不确定如何以这种格式读取它:

I have a corpus of sentences that were preprocessed by Stanford's CoreNLP systems. One of the things it provides is the sentence's Parse Tree (Constituency-based). While I can understand a parse tree when it's drawn (like a tree), I'm not sure how to read it in this format:

例如:

          (ROOT
          (FRAG
          (NP (NN sent28))
          (: :)
          (S
          (NP (NNP Rome))
          (VP (VBZ is)
          (PP (IN in)
          (NP
          (NP (NNP Lazio) (NN province))
          (CC and)
          (NP
          (NP (NNP Naples))
          (PP (IN in)
          (NP (NNP Campania))))))))
          (. .)))

原始句子是:

sent28: Rome is in Lazio province and Naples in Campania .

我应该如何阅读这棵树,或者是否有代码(在python中)正确执行? 谢谢.

How am I supposed to read this tree, or alternatively, is there a code (in python) that does it properly? Thanks.

推荐答案

NLTK具有用于读取解析树的类:nltk.tree.Tree.相关方法称为fromstring.然后,您可以迭代其子树,叶子等...

NLTK has a class for reading parse trees: nltk.tree.Tree. The relevant method is called fromstring. You can then iterate its subtrees, leaves, etc...

顺便说一句:您可能想删除表示sent28:的位,因为它混淆了解析器(它也不是句子的一部分).您不会得到完整的语法分析树,而只是一个句子片段.

As an aside: you might want to remove the bit that says sent28: as it confuses the parser (it's also not a part of the sentence). You are not getting a full parse tree, but just a sentence fragment.

这篇关于如何读取基于选区的解析树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆