sklearn StandardScaler 返回全零 [英] sklearn StandardScaler returns all zeros

查看:142
本文介绍了sklearn StandardScaler 返回全零的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个从以前的模型中保存的 sklearn StandardScaler 并且正在尝试将其应用于新数据

I have a sklearn StandardScaler saved from a previous model and am trying to apply it to new data

scaler = myOldStandardScaler
print("ORIG:", X)
print("CLASS:", X.__class__)
X = scaler.fit_transform(X)
print("SCALED:", X)

我有三个观察结果,每个观察结果有 2000 个特征.如果我分别运行每个观察,我会得到一个全零的输出.

I have three observations each with 2000 features. If I run each observation separately I get an output of all zeros.

ORIG: [[  3.19029839e-04   0.00000000e+00   1.90985485e-06 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]]
CLASS: <class 'numpy.matrixlib.defmatrix.matrix'>
SCALED: [[ 0.  0.  0. ...,  0.  0.  0.]]

但是如果我将所有三个观察结果都附加到一个数组中,我就会得到我想要的结果

But if I append all three observations into one array, I get the results I want

ORIG: [[  0.00000000e+00   8.69737728e-08   7.53361877e-06 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]
[  9.49627142e-04   0.00000000e+00   0.00000000e+00 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]
[  3.19029839e-04   0.00000000e+00   1.90985485e-06 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]]
CLASS: <class 'numpy.matrixlib.defmatrix.matrix'>
SCALED: [[-1.07174217  1.41421356  1.37153077 ...,  0.          0.          0.        ]
[ 1.33494964 -0.70710678 -0.98439142 ...,  0.          0.          0.        ]
[-0.26320747 -0.70710678 -0.38713935 ...,  0.          0.          0.        ]]

我看过这两个问题:

两者都没有被接受的答案.

neither of which have an accepted answer.

我试过了:

  • 从 (1,n) 到 (n,1) 的整形(这会产生错误的结果)
  • 将数组转换为 np.float32np.float64(仍然全为零)
  • 创建一个数组的数组(再次,全零)
  • 创建一个 np.matrix(同样,全为零)
  • reshaping from (1,n) to (n,1) (this gives incorrect results)
  • converting the array to np.float32 and np.float64 (still all zero)
  • creating an array of an array (again, all zero)
  • creating a np.matrix (again, all zeros)

我错过了什么?fit_transform 的输入是相同的类型,只是大小不同.

What am I missing? The input to fit_transform is getting the same type, just a different size.

如何让 StandardScaler 处理单个观察?

推荐答案

当您尝试将 StandardScaler 对象的 fit_transform 方法应用于大小为 (1,n)你显然得到全零,因为对于每个数组数,你从中减去这个数字的平均值,它等于 number 并除以这个数字的标准.如果要正确缩放数组,则应将其转换为大小为 (n, 1) 的数组.你可以这样做:

When you're trying to apply fit_transform method of StandardScaler object to array of size (1, n) you obviously get all zeros, because for each number of array you subtract from it mean of this number, which equal to number and divide to std of this number. If you want to get correct scaling of your array, you should convert it to array with size (n, 1). You can do it this way:

import numpy as np

X = np.array([1, -4, 5, 6, -8, 5]) # here should be your X in np.array format
X_transformed = scaler.fit_transform(X[:, np.newaxis])

在这种情况下,您可以根据一个对象的特征获得标准缩放比例,这不是您要找的.
如果您想通过 3 个对象的一个​​特征进行缩放,您应该将大小为 (3, 1) 的方法数组传递给 fit_transform 方法数组,其中包含与每个对象对应的特定特征的值.

In this case you get Standard scaling for one object by its features, that's not you're looking for.
If you want to get scaling by one feature of 3 objects, you should pass to fit_transform method array of size (3, 1) with values of certain feature corresponding to each object.

X = np.array([0.00000000e+00, 9.49627142e-04, 3.19029839e-04])
X_transformed = scaler.fit_transform(X[:, np.newaxis]) # you should get
# array([[-1.07174217], [1.33494964], [-0.26320747]]) you're looking for

如果你想使用已经拟合的 StandardScaler 对象,你不应该使用 fit_transform 方法,因为它用新数据重新拟合对象.StandardScalertransform 方法,它适用于单一观察:

And if you want to work with already fitted StandardScaler object, you shouldn't use fit_transform method, beacuse it refit object with new data. StandardScaler has transform method, which work with single observation:

X = np.array([1, -4, 5, 6, -8, 5]) # here should be your X in np.array format
X_transformed = scaler.transform(X.reshape(1, -1))

这篇关于sklearn StandardScaler 返回全零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆