由于尺寸不同,无法在scikit-learn中使用FeatureUnion [英] unable to use FeatureUnion in scikit-learn due to different dimensions
问题描述
我正在尝试使用 FeatureUnion
从数据结构中提取不同的功能,但是由于尺寸不同而失败: ValueError:blocks [0 ,:]具有不兼容的行尺寸
I'm trying to use FeatureUnion
to extract different features from a datastructure, but it fails due to different dimensions: ValueError: blocks[0,:] has incompatible row dimensions
我的 FeatureUnion
是通过以下方式构建的:
My FeatureUnion
is built the following way:
features = FeatureUnion([
('f1', Pipeline([
('get', GetItemTransformer('f1')),
('transform', vectorizer_f1)
])),
('f2', Pipeline([
('get', GetItemTransformer('f2')),
('transform', vectorizer_f1)
]))
])
GetItemTransformer
用于从同一结构中获取数据的不同部分。 scikit-learn问题中的此处 -tracker。
GetItemTransformer
is used to get different parts of data out of the same structure. The Idea is described here in the scikit-learn issue-tracker.
结构本身存储为 {'f1':data_f1,'f2':data_f2}
其中 data_f1
是具有不同长度的不同列表。
The Structure itself is stored as {'f1': data_f1, 'f2': data_f2}
where data_f1
are different lists with different lengths.
由于Y向量与数据字段不同,我认为会发生错误,但是如何缩放向量以适合两种情况?
Since the Y-Vector is different to the Data-Fields I assume that the error occurs, but how can I scale the vector to fit in both cases?
推荐答案
这对我有用:
class ArrayCaster(BaseEstimator, TransformerMixin):
def fit(self, x, y=None):
return self
def transform(self, data):
print data.shape
print np.transpose(np.matrix(data)).shape
return np.transpose(np.matrix(data))
FeatureUnion([('text', Pipeline([
('selector', ItemSelector(key='text')),
('vect', CountVectorizer(ngram_range=(1,1), binary=True, min_df=3)),
('tfidf', TfidfTransformer())
])
),
('other data', Pipeline([
('selector', ItemSelector(key='has_foriegn_char')),
('caster', ArrayCaster())
])
)])
这篇关于由于尺寸不同,无法在scikit-learn中使用FeatureUnion的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!