如何读取基于选区的解析树 [英] How to read constituency based parse tree
问题描述
我有一个句子集,这些句子是由Stanford的 CoreNLP 系统预处理的.它提供的功能之一是句子的分析树"(基于选区).虽然我可以理解绘制后的解析树(就像树一样),但是我不确定如何以这种格式读取它:
I have a corpus of sentences that were preprocessed by Stanford's CoreNLP systems. One of the things it provides is the sentence's Parse Tree (Constituency-based). While I can understand a parse tree when it's drawn (like a tree), I'm not sure how to read it in this format:
例如:
(ROOT
(FRAG
(NP (NN sent28))
(: :)
(S
(NP (NNP Rome))
(VP (VBZ is)
(PP (IN in)
(NP
(NP (NNP Lazio) (NN province))
(CC and)
(NP
(NP (NNP Naples))
(PP (IN in)
(NP (NNP Campania))))))))
(. .)))
原始句子是:
sent28: Rome is in Lazio province and Naples in Campania .
我应该如何阅读这棵树,或者是否有代码(在python中)正确执行? 谢谢.
How am I supposed to read this tree, or alternatively, is there a code (in python) that does it properly? Thanks.
推荐答案
NLTK
具有用于读取解析树的类:nltk.tree.Tree
.相关方法称为fromstring
.然后,您可以迭代其子树,叶子等...
NLTK
has a class for reading parse trees: nltk.tree.Tree
. The relevant method is called fromstring
. You can then iterate its subtrees, leaves, etc...
顺便说一句:您可能想删除表示sent28:
的位,因为它混淆了解析器(它也不是句子的一部分).您不会得到完整的语法分析树,而只是一个句子片段.
As an aside: you might want to remove the bit that says sent28:
as it confuses the parser (it's also not a part of the sentence). You are not getting a full parse tree, but just a sentence fragment.
这篇关于如何读取基于选区的解析树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!