在scikit中设置带有序列错误的数组元素以了解GradientBoostingClassifier [英] setting an array element with a sequence error in scikit learn GradientBoostingClassifier
本文介绍了在scikit中设置带有序列错误的数组元素以了解GradientBoostingClassifier的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这是我的代码,任何人有任何想法怎么了?当我呼叫fit
Here is my code, anyone have any ideas what is wrong? The error happens when I call fit
,
import pandas as pd
import numpy as np
from sklearn.ensemble import (RandomTreesEmbedding, RandomForestClassifier,
GradientBoostingClassifier)
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
n_estimators = 10
d = {'f1': [1, 2], 'f2': ['foo goo', 'goo zoo'], 'target':[0, 1]}
df = pd.DataFrame(data=d)
X_train, X_test, y_train, y_test = train_test_split(df, df['target'], test_size=0.1)
X_train['f2'] = CountVectorizer().fit_transform(X_train['f2'])
X_test['f2'] = CountVectorizer().fit_transform(X_test['f2'])
grd = GradientBoostingClassifier(n_estimators=n_estimator, max_depth=10)
grd.fit(X_train.values, y_train.values)
推荐答案
问题出在CountVectorizer
:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
d = {'f1': [1, 2], 'f2': ['foo goo', 'goo zoo'], 'target':[0, 1]}
df = pd.DataFrame(data=d)
df['f2'] = CountVectorizer().fit_transform(df['f2'])
df.values
是:
array([[1,
<2x3 sparse matrix of type '<class 'numpy.int64'>'
with 4 stored elements in Compressed Sparse Row format>,
0],
[2,
<2x3 sparse matrix of type '<class 'numpy.int64'>'
with 4 stored elements in Compressed Sparse Row format>,
1]], dtype=object)
我们可以看到我们正在将稀疏矩阵与稠密矩阵混合.您可以使用todense()
:
We can see that we are mixing sparse matrix with dense matrix. You can transform it to dense with: todense()
:
dense_count = CountVectorizer().fit_transform(df['f2']).todense()
其中dense_count
类似于:
matrix([[1, 1, 0],
[0, 1, 1]], dtype=int64)
这篇关于在scikit中设置带有序列错误的数组元素以了解GradientBoostingClassifier的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文