POS-Tagger 非常慢 [英] POS-Tagger is incredibly slow

查看:16
本文介绍了POS-Tagger 非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 nltk 通过首先删除给定的停用词从句子生成 n-gram.但是,nltk.pos_tag() 在我的 CPU (Intel i7) 上运行速度非常慢,最多需要 0.6 秒.

输出:

['第一次去,完全被现场爵士乐队和气氛所吸引,我点了龙虾科布沙拉.']0.620481014252[这简直是纽约最好的一餐."]0.640982151031['在红眼烧烤店你不会出错.']0.644664049149

代码:

 用于源代码中的句子:nltk_ngrams = 无如果 stop_words 不是 None:开始 = time.time()sentence_pos = nltk.pos_tag(word_tokenize(sentence))打印 time.time() - 开始filtered_words = [word for (word, pos) in sentence_pos 如果 pos 不在 stop_words 中]别的:Filtered_words = ngrams(sentence.split(), n)

这真的很慢还是我在这里做错了什么?

解决方案

使用 pos_tag_sents 标记多个句子:

<预><代码>>>>导入时间>>>从 nltk.corpus 导入棕色>>>从 nltk 导入 pos_tag>>>从 nltk 导入 pos_tag_sents>>>sents = brown.sents()[:10]>>>开始 = time.time();pos_tag(sents[0]);打印 time.time() - 开始0.934092998505>>>开始 = time.time();[pos_tag(s) for s in sents];打印 time.time() - 开始9.5061340332>>>开始 = time.time();pos_tag_sents(sents);打印 time.time() - 开始0.939551115036

I am using nltk to generate n-grams from sentences by first removing given stop words. However, nltk.pos_tag() is extremely slow taking up to 0.6 sec on my CPU (Intel i7).

The output:

['The first time I went, and was completely taken by the live jazz band and atmosphere, I ordered the Lobster Cobb Salad.']
0.620481014252
["It's simply the best meal in NYC."]
0.640982151031
['You cannot go wrong at the Red Eye Grill.']
0.644664049149

The code:

for sentence in source:

    nltk_ngrams = None

    if stop_words is not None:   
        start = time.time()
        sentence_pos = nltk.pos_tag(word_tokenize(sentence))
        print time.time() - start

        filtered_words = [word for (word, pos) in sentence_pos if pos not in stop_words]
    else:
        filtered_words = ngrams(sentence.split(), n)

Is this really that slow or am I doing something wrong here?

解决方案

Use pos_tag_sents for tagging multiple sentences:

>>> import time
>>> from nltk.corpus import brown
>>> from nltk import pos_tag
>>> from nltk import pos_tag_sents
>>> sents = brown.sents()[:10]
>>> start = time.time(); pos_tag(sents[0]); print time.time() - start
0.934092998505
>>> start = time.time(); [pos_tag(s) for s in sents]; print time.time() - start
9.5061340332
>>> start = time.time(); pos_tag_sents(sents); print time.time() - start 
0.939551115036

这篇关于POS-Tagger 非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆