由于尺寸不同,无法在scikit-learn中使用FeatureUnion [英] unable to use FeatureUnion in scikit-learn due to different dimensions

查看:98
本文介绍了由于尺寸不同,无法在scikit-learn中使用FeatureUnion的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 FeatureUnion 从数据结构中提取不同的功能,但是由于尺寸不同而失败: ValueError:blocks [0 ,:]具有不兼容的行尺寸

I'm trying to use FeatureUnion to extract different features from a datastructure, but it fails due to different dimensions: ValueError: blocks[0,:] has incompatible row dimensions

我的 FeatureUnion 是通过以下方式构建的:

My FeatureUnion is built the following way:

    features = FeatureUnion([
        ('f1', Pipeline([
            ('get', GetItemTransformer('f1')),
            ('transform', vectorizer_f1)
        ])),
        ('f2', Pipeline([
            ('get', GetItemTransformer('f2')),
            ('transform', vectorizer_f1)
        ]))
    ])

GetItemTransformer 用于从同一结构中获取数据的不同部分。 scikit-learn问题中的此处 -tracker。

GetItemTransformer is used to get different parts of data out of the same structure. The Idea is described here in the scikit-learn issue-tracker.

结构本身存储为 {'f1':data_f1,'f2':data_f2} 其中 data_f1 是具有不同长度的不同列表。

The Structure itself is stored as {'f1': data_f1, 'f2': data_f2} where data_f1 are different lists with different lengths.

由于Y向量与数据字段不同,我认为会发生错误,但是如何缩放向量以适合两种情况?

Since the Y-Vector is different to the Data-Fields I assume that the error occurs, but how can I scale the vector to fit in both cases?

推荐答案

这对我有用:

class ArrayCaster(BaseEstimator, TransformerMixin):
  def fit(self, x, y=None):
    return self

  def transform(self, data):
    print data.shape
    print np.transpose(np.matrix(data)).shape
    return np.transpose(np.matrix(data))

FeatureUnion([('text', Pipeline([
            ('selector', ItemSelector(key='text')),
            ('vect', CountVectorizer(ngram_range=(1,1), binary=True, min_df=3)),
            ('tfidf', TfidfTransformer())
          ])
        ),

        ('other data', Pipeline([
            ('selector', ItemSelector(key='has_foriegn_char')),
            ('caster', ArrayCaster())
          ])
        )])

这篇关于由于尺寸不同,无法在scikit-learn中使用FeatureUnion的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆