TypeError:期望的序列或类似数组的估计量 [英] TypeError: Expected sequence or array-like, got estimator

查看：152 发布时间：2020/5/24 3:14:00 python-2.7 pandas scikit-learn

本文介绍了TypeError:期望的序列或类似数组的估计量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在一个项目上进行用户对产品的评论.我正在使用TfidfVectorizer从我的数据集中提取特征，除了一些我手动提取的特征.

I am working on a project that has user reviews on products. I am using TfidfVectorizer to extract features from my dataset apart from some other features that I have extracted manually.

df = pd.read_csv('reviews.csv', header=0)

FEATURES = ['feature1', 'feature2']
reviews = df['review']
reviews = reviews.values.flatten()

vectorizer = TfidfVectorizer(min_df=1, decode_error='ignore', ngram_range=(1, 3), stop_words='english', max_features=45)

X = vectorizer.fit_transform(reviews)
idf = vectorizer.idf_
features = vectorizer.get_feature_names()
FEATURES += features
inverse =  vectorizer.inverse_transform(X)

for i, row in df.iterrows():
    for f in features:
        df.set_value(i, f, False)
    for inv in inverse[i]:
        df.set_value(i, inv, True)

train_df, test_df = train_test_split(df, test_size = 0.2, random_state=700)

上面的代码工作正常.但是，当我将max_features从45更改为更高的值时，在tran_test_split行上会出现错误.

The above code works fine. But when I change the max_features from 45 to anything higher I get an error on tran_test_split line.

错误是:

Traceback (most recent call last): File "analysis.py", line 120, in <module> train_df, test_df = train_test_split(df, test_size = 0.2, random_state=700) File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1906, in train_test_split arrays = indexable(*arrays) File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 201, in indexable check_consistent_length(*result) File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 173, in check_consistent_length uniques = np.unique([_num_samples(X) for X in arrays if X is not None]) File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 112, in _num_samples 'estimator %s' % x) TypeError: Expected sequence or array-like, got estimator

我不确定更改时究竟会发生什么变化，增加max_features的大小.

I am not sure what exactly is changing when I change increase the max_features size.

让我知道您是否需要更多数据或我错过了什么

Let me know if you need more data or if I have missed something

为什么起作用:

实际上不是问题的数量，而是引起问题的特别是一项功能.我猜想您正在将适合"一词作为您的文字功能之一(并且没有以更低的max_features阈值显示).

查看sklearn源代码，它通过测试以查看您的任何对象是否具有"fit"属性来确保您没有通过sklearn估计器.该代码正在检查sklearn估计器的fit方法，但是当您有数据框的fit列时(也请记住df.fit和df['fit']都选择"fit"列)，也会引发异常.

Looking at the sklearn source code, it checks to make sure you are not passing an sklearn estimator by testing to see if the any of your objects have a "fit" attribute. The code is checking for the fit method of an sklearn estimator, but will also raise an exception when you have a fit column of the dataframe (remember df.fit and df['fit'] both select the "fit" column).

这篇关于TypeError:期望的序列或类似数组的估计量的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

TypeError:期望的序列或类似数组的估计量 [英] TypeError: Expected sequence or array-like, got estimator

问题描述

推荐答案

为什么起作用:

相关文章

Python最新文章

热门教程

热门工具

登录关闭

TypeError:期望的序列或类似数组的估计量 [英] TypeError: Expected sequence or array-like, got estimator

问题描述

推荐答案

为什么起作用:

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭