NLTK-块语法不读逗号 [英] NLTK - Chunk grammar doesn't read commas
本文介绍了NLTK-块语法不读逗号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
from nltk.chunk.util import tagstr2tree
from nltk import word_tokenize, pos_tag
text = "John Rose Center is very beautiful place and i want to go there with Barbara Palvin. Also there are stores like Adidas ,Nike ,Reebok Center."
tagged_text = pos_tag(text.split())
grammar = "NP:{<NNP>+}"
cp = nltk.RegexpParser(grammar)
result = cp.parse(tagged_text)
print(result)
输出:
(S
(NP John/NNP Rose/NNP Center/NNP)
is/VBZ
very/RB
beautiful/JJ
place/NN
and/CC
i/NN
want/VBP
to/TO
go/VB
there/RB
with/IN
(NP Barbara/NNP Palvin./NNP)
Also/RB
there/EX
are/VBP
stores/NNS
like/IN
(NP Adidas/NNP ,Nike/NNP ,Reebok/NNP Center./NNP))
我用于分块的语法仅适用于nnp标签,但是如果单词与逗号连续,它们仍将在同一行上.我希望我的分块是这样的:
The grammar i use for chunking only works on nnp tags but if words are sequential with commas they will still on the same line.I want my chunk like this:
(S
(NP John/NNP Rose/NNP Center/NNP)
is/VBZ
very/RB
beautiful/JJ
place/NN
and/CC
i/NN
want/VBP
to/TO
go/VB
there/RB
with/IN
(NP Barbara/NNP Palvin./NNP)
Also/RB
there/EX
are/VBP
stores/NNS
like/IN
(NP Adidas,/NNP)
(NP Nike,/NNP)
(NP Reebok/NNP Center./NNP))
我应该在"grammar ="中写什么,还是可以像上面写的那样编辑输出?如您所见,我只为我的命名实体项目解析专有名词,请帮助我.
What should i write in the "grammar=" or can i edit the output like i wrote above?As you can see i only parse proper nouns for my named entity project pls help me out.
推荐答案
使用word_tokenize(string)
代替string.split()
:
>>> import nltk
>>> from nltk.chunk.util import tagstr2tree
>>> from nltk import word_tokenize, pos_tag
>>> text = "John Rose Center is very beautiful place and i want to go there with Barbara Palvin. Also there are stores like Adidas ,Nike ,Reebok Center."
>>> tagged_text = pos_tag(word_tokenize(text))
>>>
>>> grammar = "NP:{<NNP>+}"
>>>
>>> cp = nltk.RegexpParser(grammar)
>>> result = cp.parse(tagged_text)
>>>
>>> print(result)
(S
(NP John/NNP Rose/NNP Center/NNP)
is/VBZ
very/RB
beautiful/JJ
place/NN
and/CC
i/NN
want/VBP
to/TO
go/VB
there/RB
with/IN
(NP Barbara/NNP Palvin/NNP)
./.
Also/RB
there/EX
are/VBP
stores/NNS
like/IN
(NP Adidas/NNP)
,/,
(NP Nike/NNP)
,/,
(NP Reebok/NNP Center/NNP)
./.)
这篇关于NLTK-块语法不读逗号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文