featureUnion 与 columnTransformer? [英] featureUnion vs columnTransformer?

查看:46
本文介绍了featureUnion 与 columnTransformer?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

sklearn 中的 FeatureUnion() 和 ColumnTransformer() 有什么区别?

what is the difference between FeatureUnion() and ColumnTransformer() in sklearn?

如果我想构建一个包含混合数据类型(分类、数字、非结构化文本)的特征的监督模型,我应该使用哪个我需要组合单独的管道?

which should i use if i want to build a supervised model with features containing mixed data types (categorical, numeric, unstructured text) where i need to combine separate pipelines?

来源:https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.FeatureUnion.html

来源:https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html

推荐答案

根据 sklearn 文档:

According to the sklearn documentation:

FeatureUnion:连接多个转换器对象的结果.该估算器将一系列转换器对象并行应用于输入数据,然后连接结果.这有助于将多种特征提取机制组合到一个转换器中.

FeatureUnion: Concatenates results of multiple transformer objects. This estimator applies a list of transformer objects in parallel to the input data, then concatenates the results. This is useful to combine several feature extraction mechanisms into a single transformer.

ColumnTransformer:将转换器应用于数组或 Pandas DataFrame 的列.该估计器允许输入的不同列或列子集分别进行变换,并且每个变换器生成的特征将连接起来形成单个特征空间.这对于异构数据或列数据非常有用,可以将多种特征提取机制或转换组合到一个转换器中.

ColumnTransformer: Applies transformers to columns of an array or pandas DataFrame. This estimator allows different columns or column subsets of the input to be transformed separately and the features generated by each transformer will be concatenated to form a single feature space. This is useful for heterogeneous or columnar data, to combine several feature extraction mechanisms or transformations into a single transformer.

因此,FeatureUnion 将不同的转换器应用于整个输入数据,然后通过连接它们来组合结果.

So, FeatureUnion applies different transformers to the whole of the input data and then combines the results by concatenating them.

另一方面,ColumnTransformer 将不同的转换器应用于整个输入数据的不同子集,并再次连接结果.

ColumnTransformer, on the other hand, applies different transformers to different subsets of the whole input data, and again concatenates the results.

对于您提出的案例,ColumnTransformer 应该是第一步.然后,一旦所有列都转换为数字,使用 FeatureUnion,您可以进一步转换它们,例如,组合 PCA 和 SelectKBest

For the case you propose, the ColumnTransformer should be the first step. And then, once all the columns are converted to numeric, with FeatureUnion you could transform them even further by, e.g., combining PCA and SelectKBest

最后,您当然可以将 FeatureUnion 用作 ColumnTransformer,但您必须在每个分支中包含一个列/类型选择器,而不是仅将感兴趣的列馈入下一个转换器,如此处所述: https://ramhiser.com/post/2018-04-16-building-scikit-learn-pipeline-with-pandas-dataframe/

Finally, you could certainly use FeatureUnion as a ColumnTransformer, but you would have to include in each of the branches a column/type selector than only feeds into the next transformer down the pipeline the columns of interest, as it is explained here: https://ramhiser.com/post/2018-04-16-building-scikit-learn-pipeline-with-pandas-dataframe/

然而,ColumnTransformer 以更简单的方式做到了这一点.

However, ColumnTransformer does exactly that and in a simpler way.

这篇关于featureUnion 与 columnTransformer?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆