使用nltk从文本文件中提取所有名词 [英] Extracting all Nouns from a text file using nltk

查看:492
本文介绍了使用nltk从文本文件中提取所有名词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有更有效的方法? 我的代码读取一个文本文件并提取所有名词.

Is there a more efficient way of doing this? My code reads a text file and extracts all Nouns.

import nltk

File = open(fileName) #open file
lines = File.read() #read all lines
sentences = nltk.sent_tokenize(lines) #tokenize sentences
nouns = [] #empty to array to hold all nouns

for sentence in sentences:
     for word,pos in nltk.pos_tag(nltk.word_tokenize(str(sentence))):
         if (pos == 'NN' or pos == 'NNP' or pos == 'NNS' or pos == 'NNPS'):
             nouns.append(word)

如何减少此代码的时间复杂度?有没有办法避免使用嵌套的for循环?

How do I reduce the time complexity of this code? Is there a way to avoid using the nested for loops?

谢谢!

推荐答案

如果您愿意使用除NLTK以外的其他选项,请签出

If you are open to options other than NLTK, check out TextBlob. It extracts all nouns and noun phrases easily:

>>> from textblob import TextBlob
>>> txt = """Natural language processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the inter
actions between computers and human (natural) languages."""
>>> blob = TextBlob(txt)
>>> print(blob.noun_phrases)
[u'natural language processing', 'nlp', u'computer science', u'artificial intelligence', u'computational linguistics']

这篇关于使用nltk从文本文件中提取所有名词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆