将StandardScaler应用于数据集的一部分 [英] Apply StandardScaler to parts of a data set

查看:55
本文介绍了将StandardScaler应用于数据集的一部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 sklearn StandardScaler .可以将其应用于某些功能列,但不能应用于其他功能列吗?

I want to use sklearn's StandardScaler. Is it possible to apply it to some feature columns but not others?

例如,假设我的数据是:

data = pd.DataFrame({'Name' : [3, 4,6], 'Age' : [18, 92,98], 'Weight' : [68, 59,49]})

   Age  Name  Weight
0   18     3      68
1   92     4      59
2   98     6      49


col_names = ['Name', 'Age', 'Weight']
features = data[col_names]

我适合并转换数据

scaler = StandardScaler().fit(features.values)
features = scaler.transform(features.values)
scaled_features = pd.DataFrame(features, columns = col_names)

       Name       Age    Weight
0 -1.069045 -1.411004  1.202703
1 -0.267261  0.623041  0.042954
2  1.336306  0.787964 -1.245657

但是名称当然不是整数,而是字符串,我不想对其进行标准化.如何仅将 fit transform 方法应用于 Age Weight 列?

But of course the names are not really integers but strings and I don't want to standardize them. How can I apply the fit and transform methods only on the columns Age and Weight?

推荐答案

更新:

目前解决此问题的最佳方法是使用ColumnTransformer,如此处所述.

首先创建数据框的副本:

First create a copy of your dataframe:

scaled_features = data.copy()

在转换中不包括名称"列:

Don't include the Name column in the transformation:

col_names = ['Age', 'Weight']
features = scaled_features[col_names]
scaler = StandardScaler().fit(features.values)
features = scaler.transform(features.values)

现在,不要创建新的数据框,而是将结果分配给这两列:

Now, don't create a new dataframe but assign the result to those two columns:

scaled_features[col_names] = features
print(scaled_features)


        Age  Name    Weight
0 -1.411004     3  1.202703
1  0.623041     4  0.042954
2  0.787964     6 -1.245657

这篇关于将StandardScaler应用于数据集的一部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆