根据查找NP头的规则在NLTK和斯坦福解析中查找名词短语的头 [英] Finding head of a noun phrase in NLTK and stanford parse according to the rules of finding head of a NP

查看:195
本文介绍了根据查找NP头的规则在NLTK和斯坦福解析中查找名词短语的头的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通常,名词短语的头是名词,它是NP的最右边,如下图所示,树是父NP的头.所以

generally A head of a nounphrase is a noun which is rightmost of the NP as shown below tree is the head of the parent NP. So


            ROOT                             
             |                                
             S                               
          ___|________________________        
         NP                           |      
      ___|_____________               |       
     |                 PP             VP     
     |             ____|____      ____|___    
     NP           |         NP   |       PRT 
  ___|_______     |         |    |        |   
 DT  JJ  NN  NN   IN       NNP  VBD       RP 
 |   |   |   |    |         |    |        |   
The old oak tree from     India fell     down

输出[40]:Tree('S',[Tree('NP',[Tree('NP',[Tree('DT',['The']]),Tree('JJ',[' old']),Tree('NN',['oak']),Tree('NN',['tree'])]]),Tree('PP',[Tree('IN',['from' ]),Tree('NP',[Tree('NNP',['India'])])])))),Tree('VP',[Tree('VBD',['fell'])),树('PRT',[Tree('RP',['down'])]))))))

Out[40]: Tree('S', [Tree('NP', [Tree('NP', [Tree('DT', ['The']), Tree('JJ', ['old']), Tree('NN', ['oak']), Tree('NN', ['tree'])]), Tree('PP', [Tree('IN', ['from']), Tree('NP', [Tree('NNP', ['India'])])])]), Tree('VP', [Tree('VBD', ['fell']), Tree('PRT', [Tree('RP', ['down'])])])])

以下基于Java实现的代码 找到NP负责人的规则,但我需要基于规则:

The following code based on a java implementation uses a simplistic rule to find the head of the NP , but i need to be based on the rules:

parsestr='(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))'
def traverse(t):
    try:
        t.label()
    except AttributeError:
          return
    else:
        if t.label()=='NP':
            print 'NP:'+str(t.leaves())
            print 'NPhead:'+str(t.leaves()[-1])
            for child in t:
                 traverse(child)

        else:
            for child in t:
                traverse(child)


tree=Tree.fromstring(parsestr)
traverse(tree)

上面的代码给出输出:

NP:['The','old','oak','tree','from','India'] NPhead:印度 NP:['The','old','oak','tree'] NPhead:树 NP:['印度'] NPhead:印度

NP:['The', 'old', 'oak', 'tree', 'from', 'India'] NPhead:India NP:['The', 'old', 'oak', 'tree'] NPhead:tree NP:['India'] NPhead:India

尽管现在它给出了给定句子的正确输出,但是我需要纳入一个条件,即仅将最正确的名词提取为head,目前它不会检查它是否是名词(NN)

print 'NPhead:'+str(t.leaves()[-1])

所以上面的代码在np head条件中类似以下内容:

So something like following in the np head condition in above code:

t.leaves().getrightmostnoun() 

迈克尔·柯林斯论文(附录A)包括以下内容:找出Penn Treebank的规则,因此没有必要仅最右边的名词是head.因此,以上条件应包含这种情况.

Michael Collins dissertation (Appendix A) includes head-finding rules for the Penn Treebank, and hence it is not necessary that only the rightmost noun is the head. Hence the above conditions should incorporate such scenario.

对于其中一个答案中给出的以下示例:

For the following example as given in one of the answers:

(NP(NP的人)做了(NP的演讲))回家了

(NP (NP the person) that gave (NP the talk)) went home

主题的头名词是person,但是讲话的NP的最后一个离开节点是talk.

The head noun of the subject is person but the last leave node of the NP the person that gave the talk is talk.

推荐答案

NLTK(

There are built-in string to Tree object in NLTK (http://www.nltk.org/_modules/nltk/tree.html), see https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L541.

>>> from nltk.tree import Tree
>>> parsestr='(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))'
>>> for i in Tree.fromstring(parsestr).subtrees():
...     if i.label() == 'NP':
...             print i
... 
(NP
  (NP (DT The) (JJ old) (NN oak) (NN tree))
  (PP (IN from) (NP (NNP India))))
(NP (DT The) (JJ old) (NN oak) (NN tree))
(NP (NNP India))


>>> for i in Tree.fromstring(parsestr).subtrees():
...     if i.label() == 'NP':
...             print i.leaves()
... 
['The', 'old', 'oak', 'tree', 'from', 'India']
['The', 'old', 'oak', 'tree']
['India']

请注意,并非最右边的名词总是NP的首名词,例如

Note that it's not always the case that right most noun is the head noun of an NP, e.g.

>>> s = '(ROOT (S (NP (NN Carnac) (DT the) (NN Magnificent)) (VP (VBD gave) (NP ((DT a) (NN talk))))))'
>>> Tree.fromstring(s)
Tree('ROOT', [Tree('S', [Tree('NP', [Tree('NN', ['Carnac']), Tree('DT', ['the']), Tree('NN', ['Magnificent'])]), Tree('VP', [Tree('VBD', ['gave']), Tree('NP', [Tree('', [Tree('DT', ['a']), Tree('NN', ['talk'])])])])])])
>>> for i in Tree.fromstring(s).subtrees():
...     if i.label() == 'NP':
...             print i.leaves()[-1]
... 
Magnificent
talk

可以说,Magnificent仍然可以是head名词.另一个示例是当NP包含相对子句时:

Arguably, Magnificent can still be the head noun. Another example is when the NP includes a relative clause:

(NP(NP的人)做了(NP的演讲))回家了

(NP (NP the person) that gave (NP the talk)) went home

主题的主名词是person,但是NP the person that gave the talk的最后一个离开节点是talk.

The head noun of the subject is person but the last leave node of the NP the person that gave the talk is talk.

这篇关于根据查找NP头的规则在NLTK和斯坦福解析中查找名词短语的头的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆