为什么standardscaler和normalizer需要不同的数据输入? [英] Why do standardscaler and normalizer need different data input?

查看:73
本文介绍了为什么standardscaler和normalizer需要不同的数据输入?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试以下代码,发现 sklearn 中的 StandardScaler(或 MinMaxScaler)Normalizer 处理数据的方式非常不同.这个问题使管道建设更加困难.我想知道这种设计差异是否是故意的.

I was trying the following code and found that StandardScaler(or MinMaxScaler) and Normalizer from sklearn handle data very differently. This issue makes the pipeline construction more difficult. I was wondering if this design discrepancy is intentional or not.

from sklearn.preprocessing import StandardScaler, Normalizer, MinMaxScaler

对于Normalizer,数据是水平"读取的.

For Normalizer, the data is read "horizontally".

Normalizer(norm = 'max').fit_transform([[ 1., 1.,  2., 10],
                                        [ 2.,  0.,  0., 100],
                                        [ 0.,  -1., -1., 1000]])
#array([[ 0.1  ,  0.1  ,  0.2  ,  1.   ],
#       [ 0.02 ,  0.   ,  0.   ,  1.   ],
#       [ 0.   , -0.001, -0.001,  1.   ]])

对于StandardScalerMinMaxScaler,数据是垂直"读取的.

For StandardScaler and MinMaxScaler, the data is read "vertically".

StandardScaler().fit_transform([[ 1., 1.,  2., 10],
                                [ 2.,  0.,  0., 100],
                                [ 0.,  -1., -1., 1000]])
#array([[ 0.        ,  1.22474487,  1.33630621, -0.80538727],
#       [ 1.22474487,  0.        , -0.26726124, -0.60404045],
#       [-1.22474487, -1.22474487, -1.06904497,  1.40942772]])

MinMaxScaler().fit_transform([[ 1., 1.,  2., 10],
                              [ 2.,  0.,  0., 100],
                              [ 0.,  -1., -1., 1000]])
#array([[0.5       , 1.        , 1.        , 0.        ],
#       [1.        , 0.5       , 0.33333333, 0.09090909],
#       [0.        , 0.        , 0.        , 1.        ]])

推荐答案

这是预期的行为,因为 StandardScalerNormalizer 用于不同的目的.StandardScaler 有效垂直",因为它...

This is expected behavior, because StandardScaler and Normalizer serve different purposes. The StandardScaler works 'vertically', because it...

通过去除均值和缩放到单位方差来标准化[s]个特征

Standardize[s] features by removing the mean and scaling to unit variance

[...]通过计算训练集中样本的相关统计数据,对每个特征独立地进行居中和缩放.然后将平均值和标准偏差存储起来,以用于使用变换方法的后续数据.

[...] Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using the transform method.

规范化器横向"工作,因为它...

while the Normalizer works 'horizontally', because it...

将 [s] 个样本单独标准化为单位范数.

Normalize[s] samples individually to unit norm.

具有至少一个非零分量的每个样本(即数据矩阵的每一行)独立于其他样本进行重新缩放,使其范数(l1 或 l2)等于 1.

Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one.

请查看 scikit-learn 文档(上面的链接),以获得更多见解,从而更好地满足您的目的.

Please have a look at the scikit-learn docs (links above), to get more insight, which serves your purpose better.

这篇关于为什么standardscaler和normalizer需要不同的数据输入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆