斯坦福 NLP 解析树格式 [英] Stanford NLP parse tree format

查看:47
本文介绍了斯坦福 NLP 解析树格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这可能是一个愚蠢的问题,但是如何迭代解析树作为 NLP 解析器的输出(如斯坦福 NLP)?都是嵌套的括号,既不是 array 也不是 dictionary 或我使用过的任何其他集合类型.

(ROOT
 (S
 (PP (IN As))
 (NP (DT an) (NN accountant)))
 (NP (PRP I))
 (VP (VBP想要)
 (S
 (VP (TO to)
 (VP (VB make)
 (NP (DT a) (NN支付)))))))

解决方案

Stanford Parser 的这种特殊输出格式称为括号解析(树)".它应该被解读为带有

的图表
  • 词作为节点(例如 As、an、accountant)
  • 短语/从句作为标签(例如 S、NP、VP)
  • 边是分层链接的,并且
  • 通常解析的 TOP 或根节点是一个幻觉的 ROOT

(在这种情况下,您可以将其视为有向无环图 (DAG),因为它是单向和非循环的)

有一些库可以读取括号内的解析,例如在 NLTKnltk.tree.Tree (http://www.nltk.org/howto/tree.html):

<预><代码>>>>从 nltk.tree 导入树>>>输出 = '(ROOT (S (PP (IN As) (NP (DT an) (NN accountant))) (NP (PRP I)) (VP (VBP 要) (S (VP (TO) (VP (VB)make) (NP (DT a) (NN 付款))))))))'>>>解析树 = Tree.fromstring(输出)>>>打印解析树(根(S(PP (IN As) (NP (DT an) (NN 会计师)))(NP (PRP I))(副总裁(VBP想要)(S(VP(TO to)(VP(VB make)(NP(DT a)(NN支付))))))))>>>parsetree.pretty_print()根|秒______________________|________||副总裁||________|____|||秒|||||||副总裁|||_______|___聚丙烯 |||副总裁___|___ |||_______|___|NP NP |||NP|___|______ ||||___|_____在 DT NN PRP VBP 到 VB DT NN|||||||||作为一名会计师,我想付款>>>parsetree.leaves()['As', 'an', 'accountant', 'I', 'want', 'to', 'make', 'a', 'payment']

This may be a silly question, but how does one iterate through a parse tree as an output of an NLP parser (like Stanford NLP)? It's all nested brackets, which is neither an array nor a dictionary or any other collection type I've used.

(ROOT
  (S
    (PP (IN As)
      (NP (DT an) (NN accountant)))
    (NP (PRP I))
    (VP (VBP want)
      (S
        (VP (TO to)
          (VP (VB make)
            (NP (DT a) (NN payment))))))))

解决方案

This particular output format of the Stanford Parser is call the "bracketed parse (tree)". It is supposed to be read as a graph with

  • words as nodes (e.g. As, an, accountant)
  • phrase/clause as labels (e.g. S, NP, VP)
  • edges are linked hierarchically and
  • typically the parses TOP or root node is a hallucinated ROOT

(In this case you can read it as a Directed Acyclic Graph (DAG) since it's unidirectional and non-cyclic)

There are libraries out there to read bracketed parse, e.g. in NLTK's nltk.tree.Tree (http://www.nltk.org/howto/tree.html):

>>> from nltk.tree import Tree
>>> output = '(ROOT (S (PP (IN As) (NP (DT an) (NN accountant))) (NP (PRP I)) (VP (VBP want) (S (VP (TO to) (VP (VB make) (NP (DT a) (NN payment))))))))'
>>> parsetree = Tree.fromstring(output)
>>> print parsetree
(ROOT
  (S
    (PP (IN As) (NP (DT an) (NN accountant)))
    (NP (PRP I))
    (VP
      (VBP want)
      (S (VP (TO to) (VP (VB make) (NP (DT a) (NN payment))))))))
>>> parsetree.pretty_print()
                           ROOT                             
                            |                                
                            S                               
      ______________________|________                        
     |                  |            VP                     
     |                  |    ________|____                   
     |                  |   |             S                 
     |                  |   |             |                  
     |                  |   |             VP                
     |                  |   |     ________|___               
     PP                 |   |    |            VP            
  ___|___               |   |    |    ________|___           
 |       NP             NP  |    |   |            NP        
 |    ___|______        |   |    |   |         ___|_____     
 IN  DT         NN     PRP VBP   TO  VB       DT        NN  
 |   |          |       |   |    |   |        |         |    
 As  an     accountant  I  want  to make      a      payment

>>> parsetree.leaves()
['As', 'an', 'accountant', 'I', 'want', 'to', 'make', 'a', 'payment']

这篇关于斯坦福 NLP 解析树格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆