Python文本处理:AttributeError:'list'对象没有属性'lower' [英] Python text processing: AttributeError: 'list' object has no attribute 'lower'

查看：252 发布时间：2020/7/11 23:27:39 python csv text-classification

本文介绍了Python文本处理:AttributeError:'list'对象没有属性'lower'的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是Python和Stackoverflow的新手(请谨慎)，正在尝试学习如何进行情感分析.我正在使用在教程和此处找到的代码组合: Python-AttributeError :列表"对象没有属性，但是我一直在获取

I am new to Python and to Stackoverflow(please be gentle) and am trying to learn how to do a sentiment analysis. I am using a combination of code I found in a tutorial and here: Python - AttributeError: 'list' object has no attribute However, I keep getting

Traceback (most recent call last):
    File "C:/Python27/training", line 111, in <module>
    processedTestTweet = processTweet(row)
  File "C:/Python27/training", line 19, in processTweet
    tweet = tweet.lower()
AttributeError: 'list' object has no attribute 'lower'`

这是我的代码:

import csv
#import regex
import re
import pprint
import nltk.classify


#start replaceTwoOrMore
def replaceTwoOrMore(s):
    #look for 2 or more repetitions of character
    pattern = re.compile(r"(.)\1{1,}", re.DOTALL)
    return pattern.sub(r"\1\1", s)

# process the tweets
def processTweet(tweet):
    #Convert to lower case
    tweet = tweet.lower()
    #Convert www.* or https?://* to URL
    tweet = re.sub('((www\.[\s]+)|(https?://[^\s]+))','URL',tweet)
    #Convert @username to AT_USER
    tweet = re.sub('@[^\s]+','AT_USER',tweet)
    #Remove additional white spaces
    tweet = re.sub('[\s]+', ' ', tweet)
    #Replace #word with word
    tweet = re.sub(r'#([^\s]+)', r'\1', tweet)
    #trim
    tweet = tweet.strip('\'"')
    return tweet

#start getStopWordList
def getStopWordList(stopWordListFileName):
    #read the stopwords file and build a list
    stopWords = []
    stopWords.append('AT_USER')
    stopWords.append('URL')

    fp = open(stopWordListFileName, 'r')
    line = fp.readline()
    while line:
        word = line.strip()
        stopWords.append(word)
        line = fp.readline()
    fp.close()
    return stopWords

def getFeatureVector(tweet, stopWords):
    featureVector = []
    words = tweet.split()
    for w in words:
        #replace two or more with two occurrences
        w = replaceTwoOrMore(w)
        #strip punctuation
        w = w.strip('\'"?,.')
        #check if it consists of only words
        val = re.search(r"^[a-zA-Z][a-zA-Z0-9]*[a-zA-Z]+[a-zA-Z0-9]*$", w)
        #ignore if it is a stopWord
        if(w in stopWords or val is None):
            continue
        else:
            featureVector.append(w.lower())
     return featureVector

def extract_features(tweet):
    tweet_words = set(tweet)
    features = {}
    for word in featureList:
        features['contains(%s)' % word] = (word in tweet_words)
    return features


#Read the tweets one by one and process it
inpTweets = csv.reader(open('C:/GsTraining.csv', 'rb'),
                       delimiter=',',
                       quotechar='|')
stopWords = getStopWordList('C:/stop.txt')
count = 0;
featureList = []
tweets = []

for row in inpTweets:
    sentiment = row[0]
    tweet = row[1]
    processedTweet = processTweet(tweet)
    featureVector = getFeatureVector(processedTweet, stopWords)
    featureList.extend(featureVector)
    tweets.append((featureVector, sentiment))

# Remove featureList duplicates
featureList = list(set(featureList))

# Generate the training set
training_set = nltk.classify.util.apply_features(extract_features, tweets)

# Train the Naive Bayes classifier
NBClassifier = nltk.NaiveBayesClassifier.train(training_set)

# Test the classifier
with open('C:/CleanedNewGSMain.txt', 'r') as csvinput:
    with open('GSnewmain.csv', 'w') as csvoutput:
    writer = csv.writer(csvoutput, lineterminator='\n')
    reader = csv.reader(csvinput)

    all=[]
    row = next(reader)

    for row in reader:
        processedTestTweet = processTweet(row)
        sentiment = NBClassifier.classify(
            extract_features(getFeatureVector(processedTestTweet, stopWords)))
        row.append(sentiment)
        processTweet(row[1])

    writer.writerows(all)

任何帮助将不胜感激.

推荐答案

csv阅读器的结果是一个列表，lower仅适用于字符串.大概是一个字符串列表，因此有两个选项.您可以在每个元素上调用lower，或将列表转换为字符串，然后在其上调用lower.

The result from the csv reader is a list, lower only works on strings. Presumably it is a list of string, so there are two options. Either you can call lower on each element, or turn the list into a string and then call lower on it.

# the first approach
[item.lower() for item in tweet]

# the second approach
' '.join(tweet).lower()

但是更合理的话(如果没有更多信息很难说)，您实际上只希望从列表中选择一项.类似于以下内容:

But more reasonably (hard to tell without more information) you only actually want one item out of your list. Something along the lines of:

for row in reader:
    processedTestTweet = processTweet(row[0]) # Again, can't know if this is actually correct without seeing the file

此外，猜想您没有像您想的那样使用csv阅读器，因为现在您每次都在单个示例上训练一个朴素的贝叶斯分类器，然后让它预测一个在其上被训练的示例.也许可以解释您要做什么?

Also, guessing that you aren't using the csv reader quite like you think you are, because right now you are training a naive bayes classifier on a single example every time and then having it predict the one example it was trained on. Maybe explain what you're trying to do?

这篇关于Python文本处理:AttributeError:'list'对象没有属性'lower'的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python文本处理:AttributeError:'list'对象没有属性'lower' [英] Python text processing: AttributeError: 'list' object has no attribute 'lower'

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python文本处理:AttributeError:'list'对象没有属性'lower' [英] Python text processing: AttributeError: &#39;list&#39; object has no attribute &#39;lower&#39;

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

Python文本处理:AttributeError:'list'对象没有属性'lower' [英] Python text processing: AttributeError: 'list' object has no attribute 'lower'

登录关闭