如何使用 NLTK pos_tag() 提取名词? [英] How to extract nouns using NLTK pos_tag()?

查看:37
本文介绍了如何使用 NLTK pos_tag() 提取名词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 python 相当陌生.我无法弄清楚错误.我想使用 NLTK 提取名词.

I am fairly new to python. I am not able to figure out the bug. I want to extract nouns using NLTK.

我编写了以下代码:

import nltk

sentence = "At eight o'clock on Thursday film morning word line test best beautiful Ram Aaron design"

tokens = nltk.word_tokenize(sentence)

tagged = nltk.pos_tag(tokens)


length = len(tagged) - 1

a = list()

for i in (0,length):
    log = (tagged[i][1][0] == 'N')
    if log == True:
      a.append(tagged[i][0])

当我运行这个时,'a'只有一个元素

When I run this, 'a' only has one element

a
['detail']

我不明白为什么?

当我没有 for 循环时,它正在运行

When I do it without for loop, that is running

log = (tagged[i][1][0] == 'N')
    if log == True:
      a.append(tagged[i][0])

通过手动将 'i' 的值从 0 更改为 'length',我得到了完美的输出,但是使用 for 循环它只返回结束元素.有人能告诉我 for 循环出了什么问题吗?

by change value of 'i' manually from 0 to 'length', i get the output perfectly, but with for loop it only returns the end element. Can someone tell me what is wrong happening with for loop.

'a'应该在代码后面

['Thursday', 'film', 'morning', 'word', 'line', 'test', 'Ram' 'Aaron', 'design']

推荐答案

for i in (0,length):

这将迭代两个元素,零和length.如果您想遍历零到 length 之间的每个数字,请使用 range.

This iterates over two elements, zero and length. If you want to iterate over every number between zero and length, use range.

for i in range(0, length):

更好的是,直接迭代序列的元素而不是其索引更为惯用.这将减少出现上述错误的可能性.

Better yet, it's more idiomatic to directly iterate over the elements of a sequence, rather than its index. This will reduce the likelihood of typos like the one above.

for item in tagged:
    if item[1][0] == 'N':
      a.append(item[0])

注重尺寸的用户甚至可能更喜欢单行列表理解:

Size-conscious users may even prefer the one line list comprehension:

a = [item[0] for item in tagged if item[1][0] == 'N']

这篇关于如何使用 NLTK pos_tag() 提取名词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆