使用sklearn StandardScaler缩放的数据均值不为零 [英] Mean of data scaled with sklearn StandardScaler is not zero

查看:329
本文介绍了使用sklearn StandardScaler缩放的数据均值不为零的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码

import pandas as pd
from sklearn.preprocessing import StandardScaler
import numpy as np

df.columns=['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid', 'class']
df.dropna(how="all", inplace=True) # drops the empty line at file-end 

X = df.ix[:,0:4].values
y = df.ix[:,4].values

接下来,我要缩放数据并获取平均值:

Next I am scaling the data and get the mean values:

X_std = StandardScaler().fit_transform(X)
mean_vec = np.mean(X_std, axis=0)

我没有得到的是我的输出是这样的:

What I do not get is that my output is this:

[ -4.73695157e-16  -6.63173220e-16   3.31586610e-16  -2.84217094e-16]

我确实理解这些值如何可以不是0.如果我对其进行缩放,它应该是0零吧?

I do understand how these values can be anything other than 0. If I scale it, it should be 0 zero right?

有人可以告诉我这里发生了什么吗?

Could anyone explain to me what happens here?

推荐答案

在实践中,这些值非常接近0,以至于您可以将它们视为0.

In practice those values are so close to 0 that you can consider them to be 0.

定标器尝试将平均值设置为零,但是由于数值表示的限制,它只能使平均值真正接近0.

The scaler tries to set the mean to be zero, but due to limitations with numerical representation it can only get the mean really close to 0.

浮点算术的精度检查此问题.

Check this question on the precision of floating point arithmetics.

Machine Epsilon 的概念也很有趣,对于浮点数64的含义类似于2.22 e-16

Also interesting is the concept of Machine Epsilon and that for a float 64 is something like 2.22e-16

这篇关于使用sklearn StandardScaler缩放的数据均值不为零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆