如何标准化 pandas 数据框中的列范围内的数据 [英] How can I normalize the data in a range of columns in my pandas dataframe

查看:57
本文介绍了如何标准化 pandas 数据框中的列范围内的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个熊猫数据框SurveyData:

Suppose I have a pandas data frame surveyData:

我想通过执行以下操作来规范每一列中的数据:

I want to normalize the data in each column by performing:

surveyData_norm = (surveyData - surveyData.mean()) / (surveyData.max() - surveyData.min())

如果我的数据表仅包含我要规范化的列,这将很好地工作.但是,我有一些包含字符串数据的列,例如:

This would work fine if my data table only contained the columns I wanted to normalize. However, I have some columns containing string data preceding like:

Name  State  Gender  Age  Income  Height
Sam   CA     M        13   10000    70
Bob   AZ     M        21   25000    55
Tom   FL     M        30   100000   45

我只想规范年龄",收入"和身高"列,但是我的上述方法由于名称状态和性别列中的字符串数据而无法正常工作.

I only want to normalize the Age, Income, and Height columns but my above method does not work becuase of the string data in the name state and gender columns.

推荐答案

您可以通过多种方式对熊猫的行或列的子集执行操作.一种有用的方法是建立索引:

You can perform operations on a sub set of rows or columns in pandas in a number of ways. One useful way is indexing:

# Assuming same lines from your example
cols_to_norm = ['Age','Height']
survey_data[cols_to_norm] = survey_data[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min()))

这会将其仅应用于所需的列,并将结果分配回这些列.或者,您可以将它们设置为新的规范化列,并根据需要保留原始列.

This will apply it to only the columns you desire and assign the result back to those columns. Alternatively you could set them to new, normalized columns and keep the originals if you want.

.....

这篇关于如何标准化 pandas 数据框中的列范围内的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆