将StandardScaler应用于数据集的一部分 [英] Apply StandardScaler to parts of a data set
问题描述
我想使用 sklearn
的 StandardScaler
.可以将其应用于某些功能列,但不能应用于其他功能列吗?
I want to use sklearn
's StandardScaler
. Is it possible to apply it to some feature columns but not others?
例如,假设我的数据
是:
data = pd.DataFrame({'Name' : [3, 4,6], 'Age' : [18, 92,98], 'Weight' : [68, 59,49]})
Age Name Weight
0 18 3 68
1 92 4 59
2 98 6 49
col_names = ['Name', 'Age', 'Weight']
features = data[col_names]
我适合并转换数据
scaler = StandardScaler().fit(features.values)
features = scaler.transform(features.values)
scaled_features = pd.DataFrame(features, columns = col_names)
Name Age Weight
0 -1.069045 -1.411004 1.202703
1 -0.267261 0.623041 0.042954
2 1.336306 0.787964 -1.245657
但是名称当然不是整数,而是字符串,我不想对其进行标准化.如何仅将 fit
和 transform
方法应用于 Age
和 Weight
列?
But of course the names are not really integers but strings and I don't want to standardize them. How can I apply the fit
and transform
methods only on the columns Age
and Weight
?
推荐答案
更新:
目前解决此问题的最佳方法是使用ColumnTransformer,如此处所述.
首先创建数据框的副本:
First create a copy of your dataframe:
scaled_features = data.copy()
在转换中不包括名称"列:
Don't include the Name column in the transformation:
col_names = ['Age', 'Weight']
features = scaled_features[col_names]
scaler = StandardScaler().fit(features.values)
features = scaler.transform(features.values)
现在,不要创建新的数据框,而是将结果分配给这两列:
Now, don't create a new dataframe but assign the result to those two columns:
scaled_features[col_names] = features
print(scaled_features)
Age Name Weight
0 -1.411004 3 1.202703
1 0.623041 4 0.042954
2 0.787964 6 -1.245657
这篇关于将StandardScaler应用于数据集的一部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!