注意使用NLTK和Wordnet进行程序无法正常工作,错误消息是由于wordnet引起的 [英] Note Taking Program with NLTK and Wordnet doesnt work, Error message says its because of wordnet

查看:141
本文介绍了注意使用NLTK和Wordnet进行程序无法正常工作,错误消息是由于wordnet引起的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试用python创建一个程序,该程序将在我输入的段落中做笔记.它将对段落的第一句和最后一句以及带有日期和数字的句子进行排序.然后它将用同义词替换某些单词,并摆脱无用的形容词.我知道python的通用知识,但是我对nltk和WordNet还是陌生的.我已经启动了一个原型程序,该程序将用所有随机同义词替换句子中的单词,但是我不断收到一个错误消息,指出WordNet出了点问题.我认为我安装正确,但是我可能错了.这是我的代码:

I am trying to make a program in python that will take notes on a passage that I input. It will sort out the first and last sentence of the paragraph and the sentences with dates and numbers. It would then replace some words with synonyms, and get rid of useless adjectives. I am know the generic stuff with python, but I am new to nltk and WordNet. I've started a prototype program that will replace words in a sentence with all the random synonyms, however I keep getting an error that says there is something wrong with WordNet. I think I installed it right, but I might be wrong. Here is my code:

import random
import sys
from nltk.corpus import wordnet

print('Enter your passage')
Passage = sys.stdin.readline()
PassageList = Passage.split(' ')
wordCounter = 0
syns = []

def maxInt(list):
    i = 0
    for x in list:
    i += 1
return i



for x in PassageList:
    syns = wordnet.synsets(PassageList[wordCounter])
    synLength = maxInt(syns)
    PassageList[wordCounter] == syns[0]
    print(PassageList[wordCounter])
    wordCounter += 1

这是我不断收到的错误:

Here is the error I keep getting:

Traceback (most recent call last):
  File "C:\Users\shoob\Documents\Programs\Python\Programs\NoteTake.py",   line 22, in <module>
    PassageList[wordCounter] == syns[0]
  File "C:\Users\shoob\AppData\Local\Programs\Python\Python36-32\lib\site-packages\nltk\corpus\reader\wordnet.py", line 198, in __eq__
    return self._name == other._name
   AttributeError: 'str' object has no attribute '_name'

如果您可以提供任何帮助,将会对我有很大帮助. :-D

If you can help in anyway it would help me out a lot. :-D

推荐答案

较长时间

另一个答案更多是在NLP方面,但这是OP中代码的演练,以了解发生了什么.

In Longer

The other answer was more on the NLP side of things but here's a walkthrough on your code in the OP and see what's happening.

首先,一些Python代码约定.通常,CamelCase变量名称不是实际变量,而是类对象,因此请避免使用诸如Passage的变量.

Firstly, some conventions of Python code. Usually CamelCase variable names are not actual variables but class objects, so avoid using variables such as Passage.

此外,使用更好的变量名帮助,而不是PassageList,您可以将它们称为单词.

Also, using better variable names help, instead of PassageList, you can call them words.

例如

import random
import sys
from nltk.corpus import wordnet

print('Enter your passage')
passage = sys.stdin.readline()

# The passage.split() is aka word tokenization
# note you've skipped sentence tokenization, 
# so it doesn't fit the goal of getting first and last sentence 
# that you've described in the OP
words = passage.split(' ') 

收藏是你的朋友

接下来,您可以使用本机Python中的计数器对象,这些计数器对象将帮助您进行一些优化和使代码更具可读性.例如

Collections is your friend

Next, there are counter objects in native Python that you can make use of and that'll help you with some optimization and more readable code. E.g.

from collections import Counter
word_counter = Counter()

看看 https://docs.python.org/3/library /collections.html

如另一个答案中所述,WordNet由含义(又名同义词集)索引,它们不是同义词.要获取同义词,可以使用Synset.lemma_names()函数.但是它们实际上是有限的,您必须先经过WSD的过程,然后才能知道从任何歧义词中选择哪个同义词集的lemma_names.

As explained in the other answer, WordNet is indexed by meanings (aka synsets) and they are not synonyms. To get the synonyms, you can use the Synset.lemma_names() function. But they are really limited and you would have to go through the process of WSD before knowing the lemma_names of which synset to choose from any ambiguous word.

此外,explicit is better than implicit使用易于理解的变量名在很大程度上有助于理解和改进代码,因此请使用synonyms = []代替syn = [].

Also, explicit is better than implicit, using humanly-understandable variable names helps a lot in understanding and improving the code, so instead of syn = [], use synonyms = [].

否则,目前还不清楚syn存储什么.

Otherwise, it's really unclear what syn is storing.

不管缩进错误,目前尚不清楚在这里要实现什么功能.您只需在列表中的每个项目上加1,本质上就是长度函数,因此您可以简单地使用len(x).

Disregarding the wrong indentation, it's unclear what function is trying to achieve here. You are simply adding 1 to each item in a list, which essentially is the length function, so you could simply use len(x).

def maxInt(list):
    i = 0
    for x in list:
        i += 1
    return i

x = [1,2,3,4,5]
maxInt(x) == len(x)

要顺序访问列表中的项目,只需循环

继续,我们看到您正在以一种奇怪的方式循环浏览段落单词列表中的每个单词.

To access an item from a list sequentially, simply loop

Moving on, we see that you're looping through each word in the list of words of the passage in a strange way.

简化您的操作,

Passage = sys.stdin.readline()
PassageList = Passage.split(' ')
wordCounter = 0

for x in PassageList:
    syns = wordnet.synsets(PassageList[wordCounter])

您可以轻松完成:

from nltk.corpus import wordnet as wn

passage =sys.stdin.readline()
words = passage.split(' ')
for word in words:
    synsets_per_word = wn.synsets(word)

只需使用len()

要检查编号.给定单词的同义词集,而不是

Simply use len()

To check the no. of synsets for the given word, instead of

synLength = maxInt(syns)

您可以这样做:

from nltk.corpus import wordnet as wn

passage =sys.stdin.readline()
words = passage.split(' ')
for word in words:
    synsets_per_word = wn.synsets(word)
    num_synsets_per_word = len(synsets_per_word)

现在进入麻烦的行

该行:

PassageList[wordCounter] == syns[0]

鉴于正确的变量命名约定,我们有:

Given the proper variable naming convention, we have:

word == synsets_per_word[0]

现在这是令人困惑的部分,左侧是str类型的word.您正在尝试将其与nltk.corpus.wordnet.Synset类型的synsets_per_word[0]进行比较.

Now that's the confusing part, the left hand side is word which is of str type. And you are trying to compare it to synsets_per_word[0] which is of nltk.corpus.wordnet.Synset type.

因此,在比较两个具有不同类型的变量时,会弹出AttributeError ...

Thus when comparing the two variables with different type, the AttributeError pops up...

更大的问题是您要在这里实现什么?我的假设是,您认为同义集是一个str对象,但是正如所解释的那样,它是一个Synset对象而不是一个字符串,即使您从Synset获得lemma_names,它也是一个字符串列表,不是可以与str进行比较的str.

The bigger question is what are you trying to achieve here? My assumption is that you're thinking the synset is a str object but as explained about it's a Synset object and not a string and even if you get the lemma_names from the Synset it's a list of strings and not a str that can be compared for equivalence with a str.

首先阅读NLP,Python以及WordNet API在NLTK中的功能.

First read up on NLP, Python and what the WordNet API can do in NLTK.

然后重新定义任务,因为您将不会从WordNet中获得含糊不清的单词的大量帮助.

Then redefine the task since you're not going to get a lot of help from WordNet with ambiguous words.

这篇关于注意使用NLTK和Wordnet进行程序无法正常工作,错误消息是由于wordnet引起的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆