Python和NLTK:如何分析句子语法? [英] Python and NLTK: How to analyze sentence grammar?

查看:89
本文介绍了Python和NLTK:如何分析句子语法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这段代码,应该根据定义的语法显示句子的句法结构.但是,它返回一个空的[].我想念什么或做错什么了?

I have this code which should show the syntactic structure of the sentence according to defined grammar. However it is returning an empty []. What am I missing or doing wrong?

import nltk

grammar = nltk.parse_cfg("""
S -> NP VP 
PP -> P NP
NP -> Det N | Det N PP 
VP -> V NP | VP PP
N -> 'Kim' | 'Dana' | 'everyone'
V -> 'arrived' | 'left' |'cheered'
P -> 'or' | 'and'
""")

def main():
    sent = "Kim arrived or Dana left and everyone cheered".split()
    parser = nltk.ChartParser(grammar)
    trees = parser.nbest_parse(sent)
    for tree in trees:
        print tree

if __name__ == '__main__':
    main()

推荐答案

让我们做一些逆向工程:

Let's do some reverse engineering:

>>> import nltk
>>> grammar = nltk.parse_cfg("""
... NP -> Det N | Det N PP
... N -> 'Kim' | 'Dana' | 'everyone'
... """)
>>> sent = "Kim".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[]

似乎规则甚至都无法将首个作品识别为NP.因此,让我们尝试注入NP -> N

Seems like the rules can't recognize even the first work as NP. So let's try injecting NP -> N

>>> import nltk
>>> grammar = nltk.parse_cfg("""
... NP -> Det N | Det N PP | N
... N -> 'Kim' | 'Dana' | 'everyone'
... """)
>>> sent = "Kim".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[Tree('NP', [Tree('N', ['Kim'])])]

所以现在它可以工作了,让我们继续Kim arrived or Dana and:

So now it's working, let's continue Kim arrived or Dana and:

>>> import nltk
>>> grammar = nltk.parse_cfg("""
... S -> NP VP
... PP -> P NP
... NP -> Det N | Det N PP | N
... VP -> V NP | VP PP
... N -> 'Kim' | 'Dana' | 'everyone'
... V -> 'arrived' | 'left' |'cheered'
... P -> 'or' | 'and'
... """)
>>> sent = "Kim arrived".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[]
>>> 
>>> sent = "Kim arrived or".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[]

似乎没有办法获取带有或不带有PVP,因为V之后需要一个NP,或者它必须在树上成为VP之前取一个P,这样就放宽了规则,说了VP -> V PP而不是VP -> VP PP:

Seem like there is no way to get the VP with or without the P, since V requires either an NP after, or it has to go up the tree to be a VP before taking a P, so it's relax the rules and say VP -> V PP instead of VP -> VP PP:

>>> import nltk
>>> grammar = nltk.parse_cfg("""
... S -> NP VP
... PP -> P NP
... NP -> Det N | Det N PP | N
... VP -> V NP | V PP
... N -> 'Kim' | 'Dana' | 'everyone'
... V -> 'arrived' | 'left' |'cheered'
... P -> 'or' | 'and'
... """)
>>> sent = "Kim arrived or Dana".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[Tree('S', [Tree('NP', [Tree('N', ['Kim'])]), Tree('VP', [Tree('V', ['arrived']), Tree('PP', [Tree('P', ['or']), Tree('NP', [Tree('N', ['Dana'])])])])])]

好的,我们越来越近了,但是接下来的单词似乎又一次打破了cfg规则:

Okay, we are getting closer, but seems like the next word broke the cfg rules again:

>> import nltk
>>> grammar = nltk.parse_cfg("""
... S -> NP VP
... PP -> P NP
... NP -> Det N | Det N PP | N
... VP -> V NP | V PP
... N -> 'Kim' | 'Dana' | 'everyone'
... V -> 'arrived' | 'left' |'cheered'
... P -> 'or' | 'and'
... """)
>>> sent = "Kim arrived or Dana left".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[]
>>> sent = "Kim arrived or Dana left and".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[]
>>> 
>>> sent = "Kim arrived or Dana left and everyone".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[]
>>> 
>>> sent = "Kim arrived or Dana left and everyone cheered".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[]

因此,我希望上面的示例向您展示,尝试更改规则以从左到右合并语言现象很困难.

So I hope the above example shows you that trying to change the rules to incorporate language phenomenon from left to right is hard.

而不是从左到右做到并实现

Instead of doing it from left to right, and achieve

[[[[[[[[Kim] arrived] or] Dana] left] and] everyone] cheered]

您为什么不尝试制定更合理的语言规则来实现:

why don't you try to make more linguistically sound rules to achieve:

  1. [[[Kim arrived] or [Dana left]] and [everyone cheered]]
  2. [[Kim arrived] or [[Dana left] and [everyone cheered]]]
  1. [[[Kim arrived] or [Dana left]] and [everyone cheered]]
  2. [[Kim arrived] or [[Dana left] and [everyone cheered]]]

尝试以下方法:

import nltk
grammar = nltk.parse_cfg("""
S -> CP | VP 
CP -> VP C VP | CP C VP | VP C CP
VP -> NP V 
NP -> 'Kim' | 'Dana' | 'everyone'
V -> 'arrived' | 'left' |'cheered'
C -> 'or' | 'and'
""")

print "======= Kim arrived ========="
sent = "Kim arrived".split()
parser = nltk.ChartParser(grammar)
for t in parser.nbest_parse(sent):
    print t

print "\n======= Kim arrived or Dana left ========="
sent = "Kim arrived or Dana left".split()
parser = nltk.ChartParser(grammar)
for t in parser.nbest_parse(sent):
    print t 

print "\n=== Kim arrived or Dana left and everyone cheered ===="
sent = "Kim arrived or Dana left and everyone cheered".split()
parser = nltk.ChartParser(grammar)
for t in parser.nbest_parse(sent):
    print t

[out] :

[out]:

======= Kim arrived =========
(S (VP (NP Kim) (V arrived)))

======= Kim arrived or Dana left =========
(S (CP (VP (NP Kim) (V arrived)) (C or) (VP (NP Dana) (V left))))

=== Kim arrived or Dana left and everyone cheered ====
(S
  (CP
    (CP (VP (NP Kim) (V arrived)) (C or) (VP (NP Dana) (V left)))
    (C and)
    (VP (NP everyone) (V cheered))))
(S
  (CP
    (VP (NP Kim) (V arrived))
    (C or)
    (CP
      (VP (NP Dana) (V left))
      (C and)
      (VP (NP everyone) (V cheered)))))

以上解决方案显示了您的CFG规则如何足够健壮,不仅要捕获完整的句子,还要捕获句子的一部分.

The above solution show how your CFG rules needs to be robust enough to not only capture the full sentence but also part of the sentence too.

这篇关于Python和NLTK:如何分析句子语法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆