由于尺寸不同，无法在scikit-learn中使用FeatureUnion [英] unable to use FeatureUnion in scikit-learn due to different dimensions

查看：98 发布时间：2020/10/2 3:20:05 python scikit-learn classification text-classification

本文介绍了由于尺寸不同，无法在scikit-learn中使用FeatureUnion的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 FeatureUnion 从数据结构中提取不同的功能，但是由于尺寸不同而失败：ValueError：blocks [0 ，：]具有不兼容的行尺寸

I'm trying to use FeatureUnion to extract different features from a datastructure, but it fails due to different dimensions: ValueError: blocks[0,:] has incompatible row dimensions

我的 FeatureUnion 是通过以下方式构建的：

My FeatureUnion is built the following way:

    features = FeatureUnion([
        ('f1', Pipeline([
            ('get', GetItemTransformer('f1')),
            ('transform', vectorizer_f1)
        ])),
        ('f2', Pipeline([
            ('get', GetItemTransformer('f2')),
            ('transform', vectorizer_f1)
        ]))
    ])

GetItemTransformer 用于从同一结构中获取数据的不同部分。 scikit-learn问题中的此处 -tracker。

GetItemTransformer is used to get different parts of data out of the same structure. The Idea is described here in the scikit-learn issue-tracker.

结构本身存储为 {'f1'：data_f1，'f2'：data_f2} 其中 data_f1 是具有不同长度的不同列表。

The Structure itself is stored as {'f1': data_f1, 'f2': data_f2} where data_f1 are different lists with different lengths.

由于Y向量与数据字段不同，我认为会发生错误，但是如何缩放向量以适合两种情况？

Since the Y-Vector is different to the Data-Fields I assume that the error occurs, but how can I scale the vector to fit in both cases?

推荐答案

这对我有用：

class ArrayCaster(BaseEstimator, TransformerMixin):
  def fit(self, x, y=None):
    return self

  def transform(self, data):
    print data.shape
    print np.transpose(np.matrix(data)).shape
    return np.transpose(np.matrix(data))

FeatureUnion([('text', Pipeline([
            ('selector', ItemSelector(key='text')),
            ('vect', CountVectorizer(ngram_range=(1,1), binary=True, min_df=3)),
            ('tfidf', TfidfTransformer())
          ])
        ),

        ('other data', Pipeline([
            ('selector', ItemSelector(key='has_foriegn_char')),
            ('caster', ArrayCaster())
          ])
        )])

这篇关于由于尺寸不同，无法在scikit-learn中使用FeatureUnion的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

由于尺寸不同，无法在scikit-learn中使用FeatureUnion [英] unable to use FeatureUnion in scikit-learn due to different dimensions

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

由于尺寸不同，无法在scikit-learn中使用FeatureUnion [英] unable to use FeatureUnion in scikit-learn due to different dimensions

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭