如何使用NLTK pos标签获得更好的结果 [英] How to obtain better results using NLTK pos tag

查看:98
本文介绍了如何使用NLTK pos标签获得更好的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是在使用Python学习nltk.我尝试对各种句子进行pos_tag处理.但是获得的结果并不准确.我怎样才能即兴地得到结果?

I am just learning nltk using Python. I tried doing pos_tag on various sentences. But the results obtained are not accurate. How can I improvise the results ?

broke = NN
flimsy = NN
crap = NN

我也得到了很多额外的单词,它们被归类为NN.我该如何过滤掉这些以获得更好的结果??

Also I am getting lot of extra words being categorized as NN. How can I filter these out to get better results.?

推荐答案

给出上下文,您就可以得到这些结果.举例来说,我在上下文短语他们打破了疯狂的胡言乱语"中用pos_tag获得了其他结果:

Give the context, there you obtained these results. Just as example, I'm obtaining other results with pos_tag on the context phrase "They broke climsy crap":

import nltk
text=nltk.word_tokenize("They broke flimsy crap")
nltk.pos_tag(text)

[('他们','PRP'),('broke','VBP'),('flimsy','JJ'),('废话','NN')]

[('They', 'PRP'), ('broke', 'VBP'), ('flimsy', 'JJ'), ('crap', 'NN')]

无论如何,如果您认为您认为很多单词被错误地归类为"NN",则可以对标记为"NN"的那些单词专门应用其他技术. 例如,您可以采用一些适当的带标记语料库,并使用Trigram标记器对其进行分类. (与作者在 http://nltk.googlecode.com/svn/trunk/doc/book/ch05.html ).

Anyway, if you see that in your opinion a lot of word are falsely cathegorized as 'NN', you can apply some other technique specially on those which are marked a s 'NN'. For instance, you can take some appropriate tagged corpora and classify it with trigram tagger. (actually in the same way the authors do it with bigrams on http://nltk.googlecode.com/svn/trunk/doc/book/ch05.html).

类似这样的东西:

pos_tag_results=nltk.pos_tag(your_text) #tagged sentences with pos_tag
trigram_tagger=nltk.TrigramTagger(tagged_corpora) #build trigram tagger based on your tagged_corpora
trigram_tag_results=trigram_tagger(your_text) #tagged sentences with trigram tagger
for i in range(0,len(pos_tag_results)):
    if pos_tag_results[i][1]=='NN':
        pos_tag_results[i][1]=trigram_tag_results[i][1]#for 'NN' take trigram_tagger instead

让我知道它是否可以改善您的结果.

Let me know if it improves your results.

这篇关于如何使用NLTK pos标签获得更好的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆