CountVectorizer: AttributeError: 'numpy.ndarray' 对象没有属性 'lower' [英] CountVectorizer: AttributeError: 'numpy.ndarray' object has no attribute 'lower'
问题描述
我有一个一维数组,每个元素都有大字符串.我正在尝试使用 CountVectorizer
将文本数据转换为数值向量.但是,我收到一条错误消息:
I have a one-dimensional array with large strings in each of the elements. I am trying to use a CountVectorizer
to convert text data into numerical vectors. However, I am getting an error saying:
AttributeError: 'numpy.ndarray' object has no attribute 'lower'
mealarray
在每个元素中都包含大字符串.有 5000 个这样的样本.我正在尝试将其矢量化,如下所示:
mealarray
contains large strings in each of the elements. There are 5000 such samples. I am trying to vectorize this as given below:
vectorizer = CountVectorizer(
stop_words='english',
ngram_range=(1, 1), #ngram_range=(1, 1) is the default
dtype='double',
)
data = vectorizer.fit_transform(mealarray)
完整的堆栈跟踪:
File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 817, in fit_transform
self.fixed_vocabulary_)
File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 748, in _count_vocab
for feature in analyze(doc):
File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 234, in <lambda>
tokenize(preprocess(self.decode(doc))), stop_words)
File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 200, in <lambda>
return lambda x: strip_accents(x.lower())
AttributeError: 'numpy.ndarray' object has no attribute 'lower'
推荐答案
检查 mealarray
的形状.如果 fit_transform
是一个字符串数组,必须是一维数组.(也就是说,mealarray.shape
的形式必须是 (n,)
.)例如,如果 mealarray
有一个形如 (n, 1)
.
Check the shape of mealarray
. If the argument to fit_transform
is an array of strings, it must be a one-dimensional array. (That is, mealarray.shape
must be of the form (n,)
.) For example, you'll get the "no attribute" error if mealarray
has a shape such as (n, 1)
.
你可以试试像
data = vectorizer.fit_transform(mealarray.ravel())
这篇关于CountVectorizer: AttributeError: 'numpy.ndarray' 对象没有属性 'lower'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!