TypeError:期望的序列或类似数组的估计量 [英] TypeError: Expected sequence or array-like, got estimator

查看:152
本文介绍了TypeError:期望的序列或类似数组的估计量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在一个项目上进行用户对产品的评论.我正在使用TfidfVectorizer从我的数据集中提取特征,除了一些我手动提取的特征.

I am working on a project that has user reviews on products. I am using TfidfVectorizer to extract features from my dataset apart from some other features that I have extracted manually.

df = pd.read_csv('reviews.csv', header=0)

FEATURES = ['feature1', 'feature2']
reviews = df['review']
reviews = reviews.values.flatten()

vectorizer = TfidfVectorizer(min_df=1, decode_error='ignore', ngram_range=(1, 3), stop_words='english', max_features=45)

X = vectorizer.fit_transform(reviews)
idf = vectorizer.idf_
features = vectorizer.get_feature_names()
FEATURES += features
inverse =  vectorizer.inverse_transform(X)

for i, row in df.iterrows():
    for f in features:
        df.set_value(i, f, False)
    for inv in inverse[i]:
        df.set_value(i, inv, True)

train_df, test_df = train_test_split(df, test_size = 0.2, random_state=700)

上面的代码工作正常.但是,当我将max_features从45更改为更高的值时,在tran_test_split行上会出现错误.

The above code works fine. But when I change the max_features from 45 to anything higher I get an error on tran_test_split line.

错误是:

Traceback (most recent call last): File "analysis.py", line 120, in <module> train_df, test_df = train_test_split(df, test_size = 0.2, random_state=700) File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1906, in train_test_split arrays = indexable(*arrays) File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 201, in indexable check_consistent_length(*result) File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 173, in check_consistent_length uniques = np.unique([_num_samples(X) for X in arrays if X is not None]) File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 112, in _num_samples 'estimator %s' % x) TypeError: Expected sequence or array-like, got estimator

Traceback (most recent call last): File "analysis.py", line 120, in <module> train_df, test_df = train_test_split(df, test_size = 0.2, random_state=700) File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1906, in train_test_split arrays = indexable(*arrays) File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 201, in indexable check_consistent_length(*result) File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 173, in check_consistent_length uniques = np.unique([_num_samples(X) for X in arrays if X is not None]) File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 112, in _num_samples 'estimator %s' % x) TypeError: Expected sequence or array-like, got estimator

我不确定更改时究竟会发生什么变化,增加max_features的大小.

I am not sure what exactly is changing when I change increase the max_features size.

让我知道您是否需要更多数据或我错过了什么

Let me know if you need more data or if I have missed something

推荐答案

我知道这很旧,但是我遇到了同样的问题,尽管@shahins的答案有效,但我想要一些可以保留dataframe对象的东西,这样我就可以将我的索引编入训练/测试分组中.

I know this is old, but I had the same issue and while the answer from @shahins works, I wanted something that would keep the dataframe object so I can have my indexing in the train/test splits.

将数据框列重命名为其他名称(其他):

Rename the dataframe column fit as something (anything) else:

df = df.rename(columns = {'fit': 'fit_feature'})

为什么起作用:

实际上不是问题的数量,而是引起问题的特别是一项功能.我猜想您正在将适合"一词作为您的文字功能之一(并且没有以更低的max_features阈值显示).

查看sklearn源代码,它通过测试以查看您的任何对象是否具有"fit"属性来确保您没有通过sklearn估计器.该代码正在检查sklearn估计器的fit方法,但是当您有数据框的fit列时(也请记住df.fitdf['fit']都选择"fit"列),也会引发异常.

Looking at the sklearn source code, it checks to make sure you are not passing an sklearn estimator by testing to see if the any of your objects have a "fit" attribute. The code is checking for the fit method of an sklearn estimator, but will also raise an exception when you have a fit column of the dataframe (remember df.fit and df['fit'] both select the "fit" column).

这篇关于TypeError:期望的序列或类似数组的估计量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆