如何使用NLTK pos_tag()提取名词? [英] How to extract nouns using NLTK pos_tag()?

查看:499
本文介绍了如何使用NLTK pos_tag()提取名词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是python的新手.我无法找出错误.我想使用NLTK提取名词.

I am fairly new to python. I am not able to figure out the bug. I want to extract nouns using NLTK.

我写了以下代码:

import nltk

sentence = "At eight o'clock on Thursday film morning word line test best beautiful Ram Aaron design"

tokens = nltk.word_tokenize(sentence)

tagged = nltk.pos_tag(tokens)


length = len(tagged) - 1

a = list()

for i in (0,length):
    log = (tagged[i][1][0] == 'N')
    if log == True:
      a.append(tagged[i][0])

运行此命令时,"a"只有一个元素

When I run this, 'a' only has one element

a
['detail']

我不明白为什么?

在没有for循环的情况下,它正在运行

When I do it without for loop, that is running

log = (tagged[i][1][0] == 'N')
    if log == True:
      a.append(tagged[i][0])

通过手动将'i'的值从0更改为'length',我得到了完美的输出,但是对于for循环,它仅返回end元素.谁能告诉我for循环发生了什么问题.

by change value of 'i' manually from 0 to 'length', i get the output perfectly, but with for loop it only returns the end element. Can someone tell me what is wrong happening with for loop.

"a"应在代码后如下所示

'a' should be as follows after the code

['Thursday', 'film', 'morning', 'word', 'line', 'test', 'Ram' 'Aaron', 'design']

推荐答案

for i in (0,length):

这会遍历两个元素,零和length.如果要遍历零到length之间的每个数字,请使用range.

This iterates over two elements, zero and length. If you want to iterate over every number between zero and length, use range.

for i in range(0, length):

更好的是,直接迭代序列的元素而不是索引是一种习惯.这样可以减少出现上述错别字的可能性.

Better yet, it's more idiomatic to directly iterate over the elements of a sequence, rather than its index. This will reduce the likelihood of typos like the one above.

for item in tagged:
    if item[1][0] == 'N':
      a.append(item[0])

注重尺寸的用户甚至更喜欢单行列表理解:

Size-conscious users may even prefer the one line list comprehension:

a = [item[0] for item in tagged if item[1][0] == 'N']

这篇关于如何使用NLTK pos_tag()提取名词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆