机器学习的扩展功能 [英] Scaling features for machine learning

查看:62
本文介绍了机器学习的扩展功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对如何正确缩放数据集有疑问.

I have a question about how to scale my dataset properly.

它由

  1. 我当前存储为秒的日期

  1. A date which I currently store as seconds

一个介于1到5之间的值

A value that can be between 1 and 5

约有240个布尔值1或0

And about 240 bool values 1 or 0

所以一行看起来像

[1514761200, 3, 1, 1, 0, 0, 1, 0, 1,  ......]

我尝试应用 scikit StandardScaler ,但是它导致了一些非常奇怪的值,一些0保持0,另一些则缩放为-1.736.如果我随后在数据上应用inverse_transform,则某些布尔值将保持为怪异数字.

I tried to apply the scikit StandardScaler but it leads to some really weird values, some 0 stay 0 others are scaled to something like -1.736. And if i then apply inverse_transform on the data some bool values stay weird numbers.

我认为问题是日期列中的数字很大,我不确定.

I think the problem is the huge number in the date column, I'm not sure.

但是,如果有什么更好的方法来处理日期,或者通常如何处理1/2列,这些列不适合其余数据,但是必填的.

But if, what is a better way to handle dates or in general how do I handle 1/2 columns that just doesn't fit the rest of the data but are mandatory.

谢谢.

推荐答案

在大多数情况下,缩放是分别应用于每个功能的,这就是StandardScaler所做的.因此,将某些0保持为零而对其他0进行转换是完全自然的.看下面的代码

Scaling is in most cases applied to each feature seperately, and that's what StandardScaler is doing. Therefore it is totally natural that some 0s stay zero while others are transformed. Look at the following code

int_mat = np.array([[0,0],[0,1],[0,2]])

输出

array([[0, 0],
   [0, 1],
   [0, 2]])

现在我们进行缩放

from sklearn.preprocessing import StandardScaler

ssc = StandardScaler()
int_scaled = ssc.fit_transform(int_mat)
inverse_scaling = ssc.inverse_transform(int_scaled)

int_scaled

int_scaled

array([[ 0.        , -1.22474487],
       [ 0.        ,  0.        ],
       [ 0.        ,  1.22474487]])

如您所见,第一个特征(第一列)保持不变,因为它的均值已经为零.

as you see, the first feature (first column) stays the same because it has already zero mean.

逆变换产生原始矩阵

inverse_scaling

inverse_scaling

array([[0.00000000e+00, 1.11022302e-16],
       [0.00000000e+00, 1.00000000e+00],
       [0.00000000e+00, 2.00000000e+00]])

这篇关于机器学习的扩展功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆