斯坦福大学NLP解析树格式 [英] Stanford NLP parse tree format
问题描述
这可能是一个愚蠢的问题,但是作为NLP解析器(例如Stanford NLP)的输出,如何遍历解析树呢?都是嵌套括号,既不是array
也不是dictionary
或我使用过的任何其他集合类型.
This may be a silly question, but how does one iterate through a parse tree as an output of an NLP parser (like Stanford NLP)? It's all nested brackets, which is neither an array
nor a dictionary
or any other collection type I've used.
(ROOT\n (S\n (PP (IN As)\n (NP (DT an) (NN accountant)))\n (NP (PRP I))\n (VP (VBP want)\n (S\n (VP (TO to)\n (VP (VB make)\n (NP (DT a) (NN payment))))))))
推荐答案
斯坦福解析器的这种特殊输出格式称为包围式解析(树)".应该使用
This particular output format of the Stanford Parser is call the "bracketed parse (tree)". It is supposed to be read as a graph with
- 将单词作为节点(例如,会计)
- 短语/子句作为标签(例如S,NP,VP)
- 边缘是按层次链接的,并且
- 通常,解析的TOP或根节点是幻觉的
ROOT
- words as nodes (e.g. As, an, accountant)
- phrase/clause as labels (e.g. S, NP, VP)
- edges are linked hierarchically and
- typically the parses TOP or root node is a hallucinated
ROOT
(在这种情况下,由于它是单向且非循环的,因此可以将其读取为有向非循环图(DAG))
(In this case you can read it as a Directed Acyclic Graph (DAG) since it's unidirectional and non-cyclic)
有一些库可以读取带括号的解析,例如在NLTK
的nltk.tree.Tree
中( http://www.nltk.org/howto/tree .html ):
There are libraries out there to read bracketed parse, e.g. in NLTK
's nltk.tree.Tree
(http://www.nltk.org/howto/tree.html):
>>> from nltk.tree import Tree
>>> output = '(ROOT (S (PP (IN As) (NP (DT an) (NN accountant))) (NP (PRP I)) (VP (VBP want) (S (VP (TO to) (VP (VB make) (NP (DT a) (NN payment))))))))'
>>> parsetree = Tree.fromstring(output)
>>> print parsetree
(ROOT
(S
(PP (IN As) (NP (DT an) (NN accountant)))
(NP (PRP I))
(VP
(VBP want)
(S (VP (TO to) (VP (VB make) (NP (DT a) (NN payment))))))))
>>> parsetree.pretty_print()
ROOT
|
S
______________________|________
| | VP
| | ________|____
| | | S
| | | |
| | | VP
| | | ________|___
PP | | | VP
___|___ | | | ________|___
| NP NP | | | NP
| ___|______ | | | | ___|_____
IN DT NN PRP VBP TO VB DT NN
| | | | | | | | |
As an accountant I want to make a payment
>>> parsetree.leaves()
['As', 'an', 'accountant', 'I', 'want', 'to', 'make', 'a', 'payment']
这篇关于斯坦福大学NLP解析树格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!