AttributeError: 'list' 对象没有属性 'lower' : 聚类 [英] AttributeError: 'list' object has no attribute 'lower' : clustering

查看:88
本文介绍了AttributeError: 'list' 对象没有属性 'lower' : 聚类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试进行聚类.我正在使用 pandas 和 sklearn.

I'm trying to do a clustering. I'm doing with pandas and sklearn.

import pandas
import pprint
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.metrics import adjusted_rand_score
from sklearn.feature_extraction.text import TfidfVectorizer

dataset = pandas.read_csv('text.csv', encoding='utf-8')

dataset_list = dataset.values.tolist()


vectors = TfidfVectorizer()
X = vectors.fit_transform(dataset_list)

clusters_number = 20

model = KMeans(n_clusters = clusters_number, init = 'k-means++', max_iter = 300, n_init = 1)

model.fit(X)

centers = model.cluster_centers_
labels = model.labels_

clusters = {}
for comment, label in zip(dataset_list, labels):
    print ('Comment:', comment)
    print ('Label:', label)

try:
    clusters[str(label)].append(comment)
except:
    clusters[str(label)] = [comment]
pprint.pprint(clusters)

但是我有以下错误,即使我从未使用过lower():

But I have the following error, even though I have never used the lower():

File "clustering.py", line 19, in <module>
    X = vetorizer.fit_transform(dataset_list)
  File "/usr/lib/python3/dist-packages/sklearn/feature_extraction/text.py", line 1381, in fit_transform
    X = super(TfidfVectorizer, self).fit_transform(raw_documents)
  File "/usr/lib/python3/dist-packages/sklearn/feature_extraction/text.py", line 869, in fit_transform
self.fixed_vocabulary_)
  File "/usr/lib/python3/dist-packages/sklearn/feature_extraction/text.py", line 792, in _count_vocab
for feature in analyze(doc):
  File "/usr/lib/python3/dist-packages/sklearn/feature_extraction/text.py", line 266, in <lambda>
tokenize(preprocess(self.decode(doc))), stop_words)
  File "/usr/lib/python3/dist-packages/sklearn/feature_extraction/text.py", line 232, in <lambda>
return lambda x: strip_accents(x.lower())
AttributeError: 'list' object has no attribute 'lower'

我不明白,我的文本 (text.csv) 已经是小写了.我从来没有打电话给lower()

I don't understand, my text (text.csv) is already lowercase. And I at no time called lower()

数据:

你好想取消订单谢谢确认

hello wish to cancel order thank you confirmation

你好想取消今天的订单 store house world

hello would like to cancel order made today store house world

尺寸床不兼容想知道如何通过取消退款今天亲切发送

dimensions bed not compatible would like to know how to pass cancellation refund send today cordially

你好,可以取消订单

你好想取消订单申请退款

hello wants to cancel order request refund

您好,希望取消此订单,请亲切说明过程

hello wish to cancel this order can indicate process cordially

你好看到日期发货想取消订单谢谢

hello seen date delivery would like to cancel order thank you

你好想取消匹配订单好发货n°111111

hello wants to cancel matching order good delivery n ° 111111

您好,想取消这个订单

您好,订购产品商店取消行为双倍预付款,衷心感谢

hello order product store cancel act doublon advance thank you cordially

你好想取消订单谢谢你退款问候

hello wishes to cancel order thank you kindly refund greetings

您好,可能取消订单,请提前致谢

hello possible cancel order please thank you in advance forward cordially

推荐答案

错误在这一行:

dataset_list = dataset.values.tolist()

你看,dataset 是一个 Pandas DataFrame,所以当你做 dataset.values 时,它会被转换成一个形状为 (n_rows, 1)(即使列数为 1).然后对此调用 tolist() 将产生一个列表列表,如下所示:

You see, dataset is a pandas DataFrame, so when you do dataset.values, it will be converted to a 2-d dataset of shape (n_rows, 1) (Even if the number of columns are 1). Then calling tolist() on this will result in a list of lists, something like this:

print(dataset_list)

[[hello wish to cancel order thank you confirmation],
 [hello would like to cancel order made today store house world],
 [dimensions bed not compatible would like to know how to pass cancellation refund send today cordially]
 ...
 ...
 ...]]

如您所见,这里有两个方括号.

As you see, there are two square brackets here.

现在 TfidfVectorizer 只需要一个句子列表,而不是列表列表,因此会出现错误(因为 TfidfVectorizer 假设内部数据是句子,但这里是一个列表).

Now TfidfVectorizer only requires a list of sentences, not lists of list and hence the error (because TfidfVectorizer assumes internal data to be sentences, but here it is a list).

所以你只需要这样做:

# Use ravel to convert 2-d to 1-d array
dataset_list = dataset.values.ravel().tolist()

# Replace `column_name` with your actual column header, 
# which converts DataFrame to Series
dataset_list = dataset['column_name'].values).tolist()

这篇关于AttributeError: 'list' 对象没有属性 'lower' : 聚类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆