如何根据 pandas 中的列值对数据进行分类? [英] How to categorize data based on column values in pandas?

查看:175
本文介绍了如何根据 pandas 中的列值对数据进行分类?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我说我有这个数据框:

Let say I have this dataframe:

raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], 
        'payout': [.1, .15, .2, .3, 1.2, 1.3, 1.45, 2, 2.04, 3.011, 3.45, 1], 
        'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 
        'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
        'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['regiment', 'payout', 'name', 'preTestScore', 'postTestScore'])

现在,我想基于支出"列构建以下类别:

Now, I want to build these categories based on the column "payout":

Cat1 : 0 <= x <= 1
Cat2 : 1 <  x <= 2
Cat3 : 2 <  x <= 3
Cat4 : 3 <  x <= 4

并建立列postTestscore

我使用布尔索引来做到这一点:

I do it this way, using the boolean indexing:

df.loc[(df['payout'] > 0) & (df['payout'] <= 1), 'postTestScore'].sum()
df.loc[(df['payout'] > 1) & (df['payout'] <= 2), 'postTestScore'].sum()
etc...

这行得通,但是没有人知道这个的更简洁的(pythonic)解决方案吗?

Well it works, but does anyone know a more succinct (pythonic) solution of this one?

推荐答案

使用groupby尝试pd.cut:

df.groupby(pd.cut(df.payout, [0, 1, 2, 3, 4])).postTestScore.sum()

payout
(0, 1]    308
(1, 2]    246
(2, 3]     62
(3, 4]    132
Name: postTestScore, dtype: int64

这篇关于如何根据 pandas 中的列值对数据进行分类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆