python pandas标准化列以进行回归 [英] python pandas standardize column for regression
问题描述
我有以下df:
Date Event_Counts Category_A Category_B
20170401 982457 0 1
20170402 982754 1 0
20170402 875786 0 1
我正在准备用于回归分析的数据,并希望对Event_Counts列进行标准化,以便与类别相似.
I am preparing the data for a regression analysis and want to standardize the column Event_Counts, so that it's on a similar scale like the categories.
我使用以下代码:
from sklearn import preprocessing
df['scaled_event_counts'] = preprocessing.scale(df['Event_Counts'])
虽然我收到此警告:
DataConversionWarning: Data with input dtype int64 was converted to float64 by the scale function.
warnings.warn(msg, _DataConversionWarning)
它似乎起作用了;有一个新列.但是,它的负数是-1.3
it seems to have worked; there is a new column. However, it has negative numbers like -1.3
我认为比例函数的作用是从数字中减去平均值,然后将其除以每一行的标准差;然后将结果的最小值添加到每一行.
What I thought the scale function does is subtract the mean from the number and divide it by the standard deviation for every row; then add the min of the result to every row.
那样对熊猫不起作用吗?还是应该使用normalize()函数或StandardScaler()函数?我希望将标准化列的范围设置为0到1.
Does it not work for pandas that way? Or should I use the normalize() function or StandardScaler() function? I wanted to have the standardize column on a scale of 0 to 1.
谢谢
推荐答案
I think you are looking for the sklearn.preprocessing.MinMaxScaler
. That will allow you to scale to a given range.
因此,您的情况应该是:
So in your case it would be:
scaler = preprocessing.MinMaxScaler(feature_range=(0,1))
df['scaled_event_counts'] = scaler.fit_transform(df['Event_Counts'])
要缩放整个df:
scaled_df = scaler.fit_transform(df)
print(scaled_df)
[[ 0. 0.99722347 0. 1. ]
[ 1. 1. 1. 0. ]
[ 1. 0. 0. 1. ]]
这篇关于python pandas标准化列以进行回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!