sklearn StandardScaler与“ with_std = False或True”之间的差异和“ with_mean = False或True”; [英] sklearn StandardScaler differece between "with_std=False or True" and "with_mean=False or True"

查看:424
本文介绍了sklearn StandardScaler与“ with_std = False或True”之间的差异和“ with_mean = False或True”;的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试标准化一些数据,以便能够对其应用PCA。我正在使用sklearn.preprocessing.StandardScaler。我很难理解在参数 with_mean和 with_std中使用 True或 False之间的区别。以下是命令的说明:








说明:



如果将 with_mean with_std 设置为 False ,则将平均值μ设置为 0 std 设为1,假定列/特征来自正态高斯分布(均值为0和1 std)。



如果将 with_mean with_std 设置为 True ,那么您实际上将使用数据的真实μσ。这是最常见的方法。


I am trying to standardize some data to be able to apply PCA to it. I am using sklearn.preprocessing.StandardScaler. I am having trouble to understand the difference between using "True" or "False" in the parameters "with_mean" and "with_std". Here is the description of the command:

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html

Can someone give a more extended explanation?

Thank you very much!

解决方案

I have provided more details in this post https://stackoverflow.com/a/50879522/5025009, but let me just explain this here as well.

The standardation of the data (each column/feature/variable indivivually) involves the following equations:


Explanation:

If you set with_mean and with_std to False, then the mean μ is set to 0 and the std to 1, assuming that the columns/features are coming from the normal gaussian distribution (which has 0 mean and 1 std).

If you set with_mean and with_std to True, then you will actually use the true μ and σ of your data. This is the most common approach.

这篇关于sklearn StandardScaler与“ with_std = False或True”之间的差异和“ with_mean = False或True”;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆