尝试在python中创建分组变量 [英] Trying to create grouped variable in python

查看:131
本文介绍了尝试在python中创建分组变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一列年龄值,我需要将其转换为18-29、30-39、40-49、50-59、60-69和70+岁的年龄范围:

I have a column of age values that I need to convert to age ranges of 18-29, 30-39, 40-49, 50-59, 60-69, and 70+:

对于df文件"中某些数据的示例,我有:

For an example of some of the data in df 'file', I have:

,并希望到达:

我尝试了以下操作:

file['agerange'] = file[['age']].apply(lambda x: "18-29" if (x[0] > 16
                                       or x[0] < 30) else "other")

我宁愿不只是进行分组,因为存储桶的大小也不统一,但是如果可行的话,我会公开地提出解决方案.

I would prefer not to just do a groupby since the bucket sizes aren't uniform but I'd be open to that as a solution if it works.

提前谢谢!

推荐答案

您似乎正在使用Pandas库.它们包括执行此操作的功能: http://pandas.pydata.org/pandas-docs/version/0.16.0/generated/pandas.cut.html

It looks like you are using the Pandas library. They include a function for doing this: http://pandas.pydata.org/pandas-docs/version/0.16.0/generated/pandas.cut.html

这是我的尝试:

import pandas as pd

ages = pd.DataFrame([81, 42, 18, 55, 23, 35], columns=['age'])

bins = [18, 30, 40, 50, 60, 70, 120]
labels = ['18-29', '30-39', '40-49', '50-59', '60-69', '70+']
ages['agerange'] = pd.cut(ages.age, bins, labels = labels,include_lowest = True)

print(ages)

   age agerange
0   81      70+
1   42    40-49
2   18    18-29
3   55    50-59
4   23    18-29
5   35    30-39

这篇关于尝试在python中创建分组变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆