python pandas标准化列以进行回归 [英] python pandas standardize column for regression

查看:343
本文介绍了python pandas标准化列以进行回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下df:

Date       Event_Counts   Category_A  Category_B
20170401      982457          0           1
20170402      982754          1           0
20170402      875786          0           1

我正在准备用于回归分析的数据,并希望对Event_Counts列进行标准化,以便与类别相似.

I am preparing the data for a regression analysis and want to standardize the column Event_Counts, so that it's on a similar scale like the categories.

我使用以下代码:

from sklearn import preprocessing
df['scaled_event_counts'] = preprocessing.scale(df['Event_Counts'])

虽然我收到此警告:

DataConversionWarning: Data with input dtype int64 was converted to float64 by the scale function.
  warnings.warn(msg, _DataConversionWarning)

它似乎起作用了;有一个新列.但是,它的负数是-1.3

it seems to have worked; there is a new column. However, it has negative numbers like -1.3

我认为比例函数的作用是从数字中减去平均值,然后将其除以每一行的标准差;然后将结果的最小值添加到每一行.

What I thought the scale function does is subtract the mean from the number and divide it by the standard deviation for every row; then add the min of the result to every row.

那样对熊猫不起作用吗?还是应该使用normalize()函数或StandardScaler()函数?我希望将标准化列的范围设置为0到1.

Does it not work for pandas that way? Or should I use the normalize() function or StandardScaler() function? I wanted to have the standardize column on a scale of 0 to 1.

谢谢

推荐答案

我认为您正在寻找

I think you are looking for the sklearn.preprocessing.MinMaxScaler. That will allow you to scale to a given range.

因此,您的情况应该是:

So in your case it would be:

scaler = preprocessing.MinMaxScaler(feature_range=(0,1))
df['scaled_event_counts'] = scaler.fit_transform(df['Event_Counts'])

要缩放整个df:

scaled_df = scaler.fit_transform(df)
print(scaled_df)
[[ 0.          0.99722347  0.          1.        ]
 [ 1.          1.          1.          0.        ]
 [ 1.          0.          0.          1.        ]]

这篇关于python pandas标准化列以进行回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆