pandas :为groupby使用一列,并获取多个其他列的统计数据 [英] Pandas: use one column for groupby and get stats for multiple other columns
本文介绍了 pandas :为groupby使用一列,并获取多个其他列的统计数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
ID col1 col2
A1 1 12
A1 3 10
A1 4 16
........
A9 9 18
A9 7 11
A9 8 15
我想用列创建一个新的数据框:
ID col1_min,col1_max,col2_min,col2_max。
A1 1 4 10 16
...........
A9 7 9 11 18
我可以使用groupby来做到这一点
col1_min = df .groupby(['ID'])['col1']。min()
col1_max = df.groupby(['ID'])['col1']。max()
col2_min = df .groupby(['ID'])['col2']。min()
col2_max = df.groupby(['ID'])['col2']。max()
df2 = pd.DataFrame({'col1_min':col1_min,'col1_max':col1_max,'col2_min':col2_min,'col2_max':col2_max})
必须有一种更好更优雅的方式(单行)?
非常感谢您提前。
解决方案
df.groupby('ID')。agg(['min','max'] )
col1 col2
min最小值最大值
ID
A1 1 4 10 16
A9 7 9 11 18
展开圆柱ns with
d = df.groupby('ID')。agg(['min','max'])
d.columns = d.columns.map('_'。join)
d
col1_min col1_max col2_min col2_max
ID
A1 1 4 10 16
A9 7 9 11 18
如果您的列标题是数字,你可以使用
d = df.groupby('ID')。agg(['min', 'max'])
d.columns = d.columns.map('{0 [0]} _ {0 [1]}'。格式)
d
col1_min col1_max col2_min col2_max
ID
A1 1 4 10 16
A9 7 9 11 18
最后, reset_index
可以索引回数据框。
d = df.groupby('ID')。agg(['min','max'])
d.column s = d.columns.map('{0 [0]} _ {0 [1]}'。格式)
d.reset_index()
ID col1_min col1_max col2_min col2_max
0 A1 1 4 10 16
1 A9 7 9 11 18
I hv a data frame with 3 columns,
ID col1 col2
A1 1 12
A1 3 10
A1 4 16
........
A9 9 18
A9 7 11
A9 8 15
I want to create a new data frame with columns:
ID col1_min, col1_max, col2_min, col2_max.
A1 1 4 10 16
...........
A9 7 9 11 18
I can do this by using groupby
col1_min = df.groupby(['ID'])['col1'].min()
col1_max = df.groupby(['ID'])['col1'].max()
col2_min = df.groupby(['ID'])['col2'].min()
col2_max = df.groupby(['ID'])['col2'].max()
df2 = pd.DataFrame({'col1_min':col1_min, 'col1_max':col1_max, 'col2_min':col2_min, 'col2_max':col2_max})
There must be a better and more elegant way (one liner) ?
Many thanks in advance.
解决方案
df.groupby('ID').agg(['min', 'max'])
col1 col2
min max min max
ID
A1 1 4 10 16
A9 7 9 11 18
Flatten the columns with
d = df.groupby('ID').agg(['min', 'max'])
d.columns = d.columns.map('_'.join)
d
col1_min col1_max col2_min col2_max
ID
A1 1 4 10 16
A9 7 9 11 18
If your columns headers are numeric, you can use
d = df.groupby('ID').agg(['min', 'max'])
d.columns = d.columns.map('{0[0]}_{0[1]}'.format)
d
col1_min col1_max col2_min col2_max
ID
A1 1 4 10 16
A9 7 9 11 18
Finally, reset_index
to get index back in dataframe proper.
d = df.groupby('ID').agg(['min', 'max'])
d.columns = d.columns.map('{0[0]}_{0[1]}'.format)
d.reset_index()
ID col1_min col1_max col2_min col2_max
0 A1 1 4 10 16
1 A9 7 9 11 18
这篇关于 pandas :为groupby使用一列,并获取多个其他列的统计数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文