注意使用NLTK和Wordnet进行程序无法正常工作,错误消息是由于wordnet引起的 [英] Note Taking Program with NLTK and Wordnet doesnt work, Error message says its because of wordnet
问题描述
我正在尝试用python创建一个程序,该程序将在我输入的段落中做笔记.它将对段落的第一句和最后一句以及带有日期和数字的句子进行排序.然后它将用同义词替换某些单词,并摆脱无用的形容词.我知道python的通用知识,但是我对nltk和WordNet还是陌生的.我已经启动了一个原型程序,该程序将用所有随机同义词替换句子中的单词,但是我不断收到一个错误消息,指出WordNet出了点问题.我认为我安装正确,但是我可能错了.这是我的代码:
I am trying to make a program in python that will take notes on a passage that I input. It will sort out the first and last sentence of the paragraph and the sentences with dates and numbers. It would then replace some words with synonyms, and get rid of useless adjectives. I am know the generic stuff with python, but I am new to nltk and WordNet. I've started a prototype program that will replace words in a sentence with all the random synonyms, however I keep getting an error that says there is something wrong with WordNet. I think I installed it right, but I might be wrong. Here is my code:
import random
import sys
from nltk.corpus import wordnet
print('Enter your passage')
Passage = sys.stdin.readline()
PassageList = Passage.split(' ')
wordCounter = 0
syns = []
def maxInt(list):
i = 0
for x in list:
i += 1
return i
for x in PassageList:
syns = wordnet.synsets(PassageList[wordCounter])
synLength = maxInt(syns)
PassageList[wordCounter] == syns[0]
print(PassageList[wordCounter])
wordCounter += 1
这是我不断收到的错误:
Here is the error I keep getting:
Traceback (most recent call last):
File "C:\Users\shoob\Documents\Programs\Python\Programs\NoteTake.py", line 22, in <module>
PassageList[wordCounter] == syns[0]
File "C:\Users\shoob\AppData\Local\Programs\Python\Python36-32\lib\site-packages\nltk\corpus\reader\wordnet.py", line 198, in __eq__
return self._name == other._name
AttributeError: 'str' object has no attribute '_name'
如果您可以提供任何帮助,将会对我有很大帮助. :-D
If you can help in anyway it would help me out a lot. :-D
推荐答案
较长时间
另一个答案更多是在NLP方面,但这是OP中代码的演练,以了解发生了什么.
In Longer
The other answer was more on the NLP side of things but here's a walkthrough on your code in the OP and see what's happening.
首先,一些Python代码约定.通常,CamelCase变量名称不是实际变量,而是类对象,因此请避免使用诸如Passage
的变量.
Firstly, some conventions of Python code. Usually CamelCase variable names are not actual variables but class objects, so avoid using variables such as Passage
.
此外,使用更好的变量名帮助,而不是PassageList
,您可以将它们称为单词.
Also, using better variable names help, instead of PassageList
, you can call them words.
例如
import random
import sys
from nltk.corpus import wordnet
print('Enter your passage')
passage = sys.stdin.readline()
# The passage.split() is aka word tokenization
# note you've skipped sentence tokenization,
# so it doesn't fit the goal of getting first and last sentence
# that you've described in the OP
words = passage.split(' ')
收藏是你的朋友
接下来,您可以使用本机Python中的计数器对象,这些计数器对象将帮助您进行一些优化和使代码更具可读性.例如
Collections is your friend
Next, there are counter objects in native Python that you can make use of and that'll help you with some optimization and more readable code. E.g.
from collections import Counter
word_counter = Counter()
看看 https://docs.python.org/3/library /collections.html
如另一个答案中所述,WordNet由含义(又名同义词集)索引,它们不是同义词.要获取同义词,可以使用Synset.lemma_names()
函数.但是它们实际上是有限的,您必须先经过WSD的过程,然后才能知道从任何歧义词中选择哪个同义词集的lemma_names.
As explained in the other answer, WordNet is indexed by meanings (aka synsets) and they are not synonyms. To get the synonyms, you can use the Synset.lemma_names()
function. But they are really limited and you would have to go through the process of WSD before knowing the lemma_names of which synset to choose from any ambiguous word.
此外,explicit is better than implicit
使用易于理解的变量名在很大程度上有助于理解和改进代码,因此请使用synonyms = []
代替syn = []
.
Also, explicit is better than implicit
, using humanly-understandable variable names helps a lot in understanding and improving the code, so instead of syn = []
, use synonyms = []
.
否则,目前还不清楚syn
存储什么.
Otherwise, it's really unclear what syn
is storing.
不管缩进错误,目前尚不清楚在这里要实现什么功能.您只需在列表中的每个项目上加1,本质上就是长度函数,因此您可以简单地使用len(x)
.
Disregarding the wrong indentation, it's unclear what function is trying to achieve here. You are simply adding 1 to each item in a list, which essentially is the length function, so you could simply use len(x)
.
def maxInt(list):
i = 0
for x in list:
i += 1
return i
x = [1,2,3,4,5]
maxInt(x) == len(x)
要顺序访问列表中的项目,只需循环
继续,我们看到您正在以一种奇怪的方式循环浏览段落单词列表中的每个单词.
To access an item from a list sequentially, simply loop
Moving on, we see that you're looping through each word in the list of words of the passage in a strange way.
简化您的操作,
Passage = sys.stdin.readline()
PassageList = Passage.split(' ')
wordCounter = 0
for x in PassageList:
syns = wordnet.synsets(PassageList[wordCounter])
您可以轻松完成:
from nltk.corpus import wordnet as wn
passage =sys.stdin.readline()
words = passage.split(' ')
for word in words:
synsets_per_word = wn.synsets(word)
只需使用len()
要检查编号.给定单词的同义词集,而不是
Simply use len()
To check the no. of synsets for the given word, instead of
synLength = maxInt(syns)
您可以这样做:
from nltk.corpus import wordnet as wn
passage =sys.stdin.readline()
words = passage.split(' ')
for word in words:
synsets_per_word = wn.synsets(word)
num_synsets_per_word = len(synsets_per_word)
现在进入麻烦的行
该行:
PassageList[wordCounter] == syns[0]
鉴于正确的变量命名约定,我们有:
Given the proper variable naming convention, we have:
word == synsets_per_word[0]
现在这是令人困惑的部分,左侧是str
类型的word
.您正在尝试将其与nltk.corpus.wordnet.Synset
类型的synsets_per_word[0]
进行比较.
Now that's the confusing part, the left hand side is word
which is of str
type. And you are trying to compare it to synsets_per_word[0]
which is of nltk.corpus.wordnet.Synset
type.
因此,在比较两个具有不同类型的变量时,会弹出AttributeError
...
Thus when comparing the two variables with different type, the AttributeError
pops up...
更大的问题是您要在这里实现什么?我的假设是,您认为同义集是一个str
对象,但是正如所解释的那样,它是一个Synset
对象而不是一个字符串,即使您从Synset
获得lemma_names
,它也是一个字符串列表,不是可以与str
进行比较的str
.
The bigger question is what are you trying to achieve here? My assumption is that you're thinking the synset is a str
object but as explained about it's a Synset
object and not a string and even if you get the lemma_names
from the Synset
it's a list of strings and not a str
that can be compared for equivalence with a str
.
首先阅读NLP,Python以及WordNet API在NLTK中的功能.
First read up on NLP, Python and what the WordNet API can do in NLTK.
然后重新定义任务,因为您将不会从WordNet中获得含糊不清的单词的大量帮助.
Then redefine the task since you're not going to get a lot of help from WordNet with ambiguous words.
这篇关于注意使用NLTK和Wordnet进行程序无法正常工作,错误消息是由于wordnet引起的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!