POS-Tagger非常慢 [英] POS-Tagger is incredibly slow
本文介绍了POS-Tagger非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在使用nltk
通过首先删除给定的停用词从句子中生成n-gram.但是,nltk.pos_tag()
极其慢,在我的CPU(Intel i7)上最多需要0.6秒.
I am using nltk
to generate n-grams from sentences by first removing given stop words. However, nltk.pos_tag()
is extremely slow taking up to 0.6 sec on my CPU (Intel i7).
输出:
['The first time I went, and was completely taken by the live jazz band and atmosphere, I ordered the Lobster Cobb Salad.']
0.620481014252
["It's simply the best meal in NYC."]
0.640982151031
['You cannot go wrong at the Red Eye Grill.']
0.644664049149
代码:
for sentence in source:
nltk_ngrams = None
if stop_words is not None:
start = time.time()
sentence_pos = nltk.pos_tag(word_tokenize(sentence))
print time.time() - start
filtered_words = [word for (word, pos) in sentence_pos if pos not in stop_words]
else:
filtered_words = ngrams(sentence.split(), n)
这真的那么慢吗,还是我在这里做错了什么?
Is this really that slow or am I doing something wrong here?
推荐答案
使用pos_tag_sents
标记多个句子:
>>> import time
>>> from nltk.corpus import brown
>>> from nltk import pos_tag
>>> from nltk import pos_tag_sents
>>> sents = brown.sents()[:10]
>>> start = time.time(); pos_tag(sents[0]); print time.time() - start
0.934092998505
>>> start = time.time(); [pos_tag(s) for s in sents]; print time.time() - start
9.5061340332
>>> start = time.time(); pos_tag_sents(sents); print time.time() - start
0.939551115036
这篇关于POS-Tagger非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文