使用sklearn StandardScaler缩放的数据均值不为零 [英] Mean of data scaled with sklearn StandardScaler is not zero
问题描述
我有以下代码
import pandas as pd
from sklearn.preprocessing import StandardScaler
import numpy as np
df.columns=['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid', 'class']
df.dropna(how="all", inplace=True) # drops the empty line at file-end
X = df.ix[:,0:4].values
y = df.ix[:,4].values
接下来,我要缩放数据并获取平均值:
Next I am scaling the data and get the mean values:
X_std = StandardScaler().fit_transform(X)
mean_vec = np.mean(X_std, axis=0)
我没有得到的是我的输出是这样的:
What I do not get is that my output is this:
[ -4.73695157e-16 -6.63173220e-16 3.31586610e-16 -2.84217094e-16]
我确实理解这些值如何可以不是0.如果我对其进行缩放,它应该是0零吧?
I do understand how these values can be anything other than 0. If I scale it, it should be 0 zero right?
有人可以告诉我这里发生了什么吗?
Could anyone explain to me what happens here?
推荐答案
在实践中,这些值非常接近0,以至于您可以将它们视为0.
In practice those values are so close to 0 that you can consider them to be 0.
定标器尝试将平均值设置为零,但是由于数值表示的限制,它只能使平均值真正接近0.
The scaler tries to set the mean to be zero, but due to limitations with numerical representation it can only get the mean really close to 0.
以浮点算术的精度检查此问题.
Check this question on the precision of floating point arithmetics.
Machine Epsilon 的概念也很有趣,对于浮点数64的含义类似于2.22 e-16
Also interesting is the concept of Machine Epsilon and that for a float 64 is something like 2.22e-16
这篇关于使用sklearn StandardScaler缩放的数据均值不为零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!