NLTK使用的实际示例 [英] Practical examples of NLTK use
本文介绍了NLTK使用的实际示例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在玩自然语言工具包(NLTK).
I'm playing around with the Natural Language Toolkit (NLTK).
它的文档(书和
Its documentation (Book and HOWTO) are quite bulky and the examples are sometimes slightly advanced.
是否有NLTK的使用/应用的良好但基本的示例?我正在考虑类似 Stream Hacker NTLK文章 >博客.
Are there any good but basic examples of uses/applications of NLTK? I'm thinking of things like the NTLK articles on the Stream Hacker blog.
推荐答案
这是我自己的实际示例,它可以使其他任何人都受益于查找此问题(对不起示例文本,这是我在维基百科):
Here's my own practical example for the benefit of anyone else looking this question up (excuse the sample text, it was the first thing I found on Wikipedia):
import nltk
import pprint
tokenizer = None
tagger = None
def init_nltk():
global tokenizer
global tagger
tokenizer = nltk.tokenize.RegexpTokenizer(r'\w+|[^\w\s]+')
tagger = nltk.UnigramTagger(nltk.corpus.brown.tagged_sents())
def tag(text):
global tokenizer
global tagger
if not tokenizer:
init_nltk()
tokenized = tokenizer.tokenize(text)
tagged = tagger.tag(tokenized)
tagged.sort(lambda x,y:cmp(x[1],y[1]))
return tagged
def main():
text = """Mr Blobby is a fictional character who featured on Noel
Edmonds' Saturday night entertainment show Noel's House Party,
which was often a ratings winner in the 1990s. Mr Blobby also
appeared on the Jamie Rose show of 1997. He was designed as an
outrageously over the top parody of a one-dimensional, mute novelty
character, which ironically made him distinctive, absurd and popular.
He was a large pink humanoid, covered with yellow spots, sporting a
permanent toothy grin and jiggling eyes. He communicated by saying
the word "blobby" in an electronically-altered voice, expressing
his moods through tone of voice and repetition.
There was a Mrs. Blobby, seen briefly in the video, and sold as a
doll.
However Mr Blobby actually started out as part of the 'Gotcha'
feature during the show's second series (originally called 'Gotcha
Oscars' until the threat of legal action from the Academy of Motion
Picture Arts and Sciences[citation needed]), in which celebrities
were caught out in a Candid Camera style prank. Celebrities such as
dancer Wayne Sleep and rugby union player Will Carling would be
enticed to take part in a fictitious children's programme based around
their profession. Mr Blobby would clumsily take part in the activity,
knocking over the set, causing mayhem and saying "blobby blobby
blobby", until finally when the prank was revealed, the Blobby
costume would be opened - revealing Noel inside. This was all the more
surprising for the "victim" as during rehearsals Blobby would be
played by an actor wearing only the arms and legs of the costume and
speaking in a normal manner.[citation needed]"""
tagged = tag(text)
l = list(set(tagged))
l.sort(lambda x,y:cmp(x[1],y[1]))
pprint.pprint(l)
if __name__ == '__main__':
main()
输出:
[('rugby', None),
('Oscars', None),
('1990s', None),
('",', None),
('Candid', None),
('"', None),
('blobby', None),
('Edmonds', None),
('Mr', None),
('outrageously', None),
('.[', None),
('toothy', None),
('Celebrities', None),
('Gotcha', None),
(']),', None),
('Jamie', None),
('humanoid', None),
('Blobby', None),
('Carling', None),
('enticed', None),
('programme', None),
('1997', None),
('s', None),
("'", "'"),
('[', '('),
('(', '('),
(']', ')'),
(',', ','),
('.', '.'),
('all', 'ABN'),
('the', 'AT'),
('an', 'AT'),
('a', 'AT'),
('be', 'BE'),
('were', 'BED'),
('was', 'BEDZ'),
('is', 'BEZ'),
('and', 'CC'),
('one', 'CD'),
('until', 'CS'),
('as', 'CS'),
('This', 'DT'),
('There', 'EX'),
('of', 'IN'),
('inside', 'IN'),
('from', 'IN'),
('around', 'IN'),
('with', 'IN'),
('through', 'IN'),
('-', 'IN'),
('on', 'IN'),
('in', 'IN'),
('by', 'IN'),
('during', 'IN'),
('over', 'IN'),
('for', 'IN'),
('distinctive', 'JJ'),
('permanent', 'JJ'),
('mute', 'JJ'),
('popular', 'JJ'),
('such', 'JJ'),
('fictional', 'JJ'),
('yellow', 'JJ'),
('pink', 'JJ'),
('fictitious', 'JJ'),
('normal', 'JJ'),
('dimensional', 'JJ'),
('legal', 'JJ'),
('large', 'JJ'),
('surprising', 'JJ'),
('absurd', 'JJ'),
('Will', 'MD'),
('would', 'MD'),
('style', 'NN'),
('threat', 'NN'),
('novelty', 'NN'),
('union', 'NN'),
('prank', 'NN'),
('winner', 'NN'),
('parody', 'NN'),
('player', 'NN'),
('actor', 'NN'),
('character', 'NN'),
('victim', 'NN'),
('costume', 'NN'),
('action', 'NN'),
('activity', 'NN'),
('dancer', 'NN'),
('grin', 'NN'),
('doll', 'NN'),
('top', 'NN'),
('mayhem', 'NN'),
('citation', 'NN'),
('part', 'NN'),
('repetition', 'NN'),
('manner', 'NN'),
('tone', 'NN'),
('Picture', 'NN'),
('entertainment', 'NN'),
('night', 'NN'),
('series', 'NN'),
('voice', 'NN'),
('Mrs', 'NN'),
('video', 'NN'),
('Motion', 'NN'),
('profession', 'NN'),
('feature', 'NN'),
('word', 'NN'),
('Academy', 'NN-TL'),
('Camera', 'NN-TL'),
('Party', 'NN-TL'),
('House', 'NN-TL'),
('eyes', 'NNS'),
('spots', 'NNS'),
('rehearsals', 'NNS'),
('ratings', 'NNS'),
('arms', 'NNS'),
('celebrities', 'NNS'),
('children', 'NNS'),
('moods', 'NNS'),
('legs', 'NNS'),
('Sciences', 'NNS-TL'),
('Arts', 'NNS-TL'),
('Wayne', 'NP'),
('Rose', 'NP'),
('Noel', 'NP'),
('Saturday', 'NR'),
('second', 'OD'),
('his', 'PP$'),
('their', 'PP$'),
('him', 'PPO'),
('He', 'PPS'),
('more', 'QL'),
('However', 'RB'),
('actually', 'RB'),
('also', 'RB'),
('clumsily', 'RB'),
('originally', 'RB'),
('only', 'RB'),
('often', 'RB'),
('ironically', 'RB'),
('briefly', 'RB'),
('finally', 'RB'),
('electronically', 'RB-HL'),
('out', 'RP'),
('to', 'TO'),
('show', 'VB'),
('Sleep', 'VB'),
('take', 'VB'),
('opened', 'VBD'),
('played', 'VBD'),
('caught', 'VBD'),
('appeared', 'VBD'),
('revealed', 'VBD'),
('started', 'VBD'),
('saying', 'VBG'),
('causing', 'VBG'),
('expressing', 'VBG'),
('knocking', 'VBG'),
('wearing', 'VBG'),
('speaking', 'VBG'),
('sporting', 'VBG'),
('revealing', 'VBG'),
('jiggling', 'VBG'),
('sold', 'VBN'),
('called', 'VBN'),
('made', 'VBN'),
('altered', 'VBN'),
('based', 'VBN'),
('designed', 'VBN'),
('covered', 'VBN'),
('communicated', 'VBN'),
('needed', 'VBN'),
('seen', 'VBN'),
('set', 'VBN'),
('featured', 'VBN'),
('which', 'WDT'),
('who', 'WPS'),
('when', 'WRB')]
这篇关于NLTK使用的实际示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文