TypeError:期望的序列或类似数组的估计量 [英] TypeError: Expected sequence or array-like, got estimator
问题描述
我正在一个项目上进行用户对产品的评论.我正在使用TfidfVectorizer从我的数据集中提取特征,除了一些我手动提取的特征.
I am working on a project that has user reviews on products. I am using TfidfVectorizer to extract features from my dataset apart from some other features that I have extracted manually.
df = pd.read_csv('reviews.csv', header=0)
FEATURES = ['feature1', 'feature2']
reviews = df['review']
reviews = reviews.values.flatten()
vectorizer = TfidfVectorizer(min_df=1, decode_error='ignore', ngram_range=(1, 3), stop_words='english', max_features=45)
X = vectorizer.fit_transform(reviews)
idf = vectorizer.idf_
features = vectorizer.get_feature_names()
FEATURES += features
inverse = vectorizer.inverse_transform(X)
for i, row in df.iterrows():
for f in features:
df.set_value(i, f, False)
for inv in inverse[i]:
df.set_value(i, inv, True)
train_df, test_df = train_test_split(df, test_size = 0.2, random_state=700)
上面的代码工作正常.但是,当我将max_features
从45更改为更高的值时,在tran_test_split
行上会出现错误.
The above code works fine. But when I change the max_features
from 45 to anything higher I get an error on tran_test_split
line.
错误是:
Traceback (most recent call last):
File "analysis.py", line 120, in <module>
train_df, test_df = train_test_split(df, test_size = 0.2, random_state=700)
File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1906, in train_test_split
arrays = indexable(*arrays)
File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 201, in indexable
check_consistent_length(*result)
File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 173, in check_consistent_length
uniques = np.unique([_num_samples(X) for X in arrays if X is not None])
File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 112, in _num_samples
'estimator %s' % x)
TypeError: Expected sequence or array-like, got estimator
Traceback (most recent call last):
File "analysis.py", line 120, in <module>
train_df, test_df = train_test_split(df, test_size = 0.2, random_state=700)
File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1906, in train_test_split
arrays = indexable(*arrays)
File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 201, in indexable
check_consistent_length(*result)
File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 173, in check_consistent_length
uniques = np.unique([_num_samples(X) for X in arrays if X is not None])
File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 112, in _num_samples
'estimator %s' % x)
TypeError: Expected sequence or array-like, got estimator
我不确定更改时究竟会发生什么变化,增加max_features
的大小.
I am not sure what exactly is changing when I change increase the max_features
size.
让我知道您是否需要更多数据或我错过了什么
Let me know if you need more data or if I have missed something
推荐答案
我知道这很旧,但是我遇到了同样的问题,尽管@shahins的答案有效,但我想要一些可以保留dataframe对象的东西,这样我就可以将我的索引编入训练/测试分组中.
I know this is old, but I had the same issue and while the answer from @shahins works, I wanted something that would keep the dataframe object so I can have my indexing in the train/test splits.
将数据框列重命名为其他名称(其他):
Rename the dataframe column fit as something (anything) else:
df = df.rename(columns = {'fit': 'fit_feature'})
为什么起作用:
实际上不是问题的数量,而是引起问题的特别是一项功能.我猜想您正在将适合"一词作为您的文字功能之一(并且没有以更低的max_features
阈值显示).
查看sklearn源代码,它通过测试以查看您的任何对象是否具有"fit"属性来确保您没有通过sklearn估计器.该代码正在检查sklearn估计器的fit
方法,但是当您有数据框的fit
列时(也请记住df.fit
和df['fit']
都选择"fit"列),也会引发异常.
Looking at the sklearn source code, it checks to make sure you are not passing an sklearn estimator by testing to see if the any of your objects have a "fit" attribute. The code is checking for the fit
method of an sklearn estimator, but will also raise an exception when you have a fit
column of the dataframe (remember df.fit
and df['fit']
both select the "fit" column).
这篇关于TypeError:期望的序列或类似数组的估计量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!