如何使用 NLTK pos_tag() 提取名词? [英] How to extract nouns using NLTK pos_tag()?
问题描述
我对 python 相当陌生.我无法弄清楚错误.我想使用 NLTK 提取名词.
I am fairly new to python. I am not able to figure out the bug. I want to extract nouns using NLTK.
我编写了以下代码:
import nltk
sentence = "At eight o'clock on Thursday film morning word line test best beautiful Ram Aaron design"
tokens = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(tokens)
length = len(tagged) - 1
a = list()
for i in (0,length):
log = (tagged[i][1][0] == 'N')
if log == True:
a.append(tagged[i][0])
当我运行这个时,'a'只有一个元素
When I run this, 'a' only has one element
a
['detail']
我不明白为什么?
当我没有 for 循环时,它正在运行
When I do it without for loop, that is running
log = (tagged[i][1][0] == 'N')
if log == True:
a.append(tagged[i][0])
通过手动将 'i' 的值从 0 更改为 'length',我得到了完美的输出,但是使用 for 循环它只返回结束元素.有人能告诉我 for 循环出了什么问题吗?
by change value of 'i' manually from 0 to 'length', i get the output perfectly, but with for loop it only returns the end element. Can someone tell me what is wrong happening with for loop.
'a'应该在代码后面
['Thursday', 'film', 'morning', 'word', 'line', 'test', 'Ram' 'Aaron', 'design']
推荐答案
for i in (0,length):
这将迭代两个元素,零和length
.如果您想遍历零到 length
之间的每个数字,请使用 range
.
This iterates over two elements, zero and length
. If you want to iterate over every number between zero and length
, use range
.
for i in range(0, length):
更好的是,直接迭代序列的元素而不是其索引更为惯用.这将减少出现上述错误的可能性.
Better yet, it's more idiomatic to directly iterate over the elements of a sequence, rather than its index. This will reduce the likelihood of typos like the one above.
for item in tagged:
if item[1][0] == 'N':
a.append(item[0])
注重尺寸的用户甚至可能更喜欢单行列表理解:
Size-conscious users may even prefer the one line list comprehension:
a = [item[0] for item in tagged if item[1][0] == 'N']
这篇关于如何使用 NLTK pos_tag() 提取名词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!