pandas 数据框组:求和一列,取其他列的第一元素 [英] Pandas dataframe group: sum one column, take first element from others

查看:57
本文介绍了 pandas 数据框组:求和一列,取其他列的第一元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据框

x = pd.DataFrame.from_dict({'row':[1, 1, 2, 2, 3, 3, 3], 'add': [1, 2, 3, 4, 5, 6, 7], 'take1': ['a', 'b', 'c', 'd', 'e', 'f', 'g'], 'take2': ['11', '22', '33', '44', '55', '66', '77'], 'range': [100, 200, 300, 400, 500, 600, 700]})


   add  range  row take1 take2
0    1    100    1     a    11
1    2    200    1     b    22
2    3    300    2     c    33
3    4    400    2     d    44
4    5    500    3     e    55
5    6    600    3     f    66
6    7    700    3     g    77

我想按row列将其分组,然后在add列中添加条目,但是从take1take2中获得第一个条目,并从范围中选择最小值和最大值:

I want to group it by the row column, then add up entries in add column, but take the first entry from take1 and take2, and select the min and max from range:

   add    row take1 take2  min_range   max_range
0    3      1     a    11    100        200
1    7      2     c    33    300        400
2    18     3     e    55    500        700

推荐答案

使用

Use DataFrameGroupBy.agg by dict, but then some cleaning is necessary, because get MultiIndex in columns:

#create a dictionary of column names and functions to apply to that column

d = {'add':'sum', 'take1':'first', 'take2':'first', 'range':['min','max']}

#group by the row column and apply the corresponding aggregation to each 
#column as specified in the dictionary d
df = x.groupby('row', as_index=False).agg(d)

#rename some columns
df = df.rename(columns={'first':'', 'sum':''})
df.columns = ['{0[0]}_{0[1]}'.format(x).strip('_') for x in df.columns] 
print (df)
   row take1  range_min  range_max take2  add
0    1     a        100        200    11    3
1    2     c        300        400    33    7
2    3     e        500        700    55   18

详细信息:根据字典中指定的功能聚合列:

Details : Aggregate the columns based by the functions specified in the dictionary :

df = x.groupby('row', as_index=False).agg(d)


row range      take2 take1 add
        min  max first first sum
0   1   100  200    11     a   3
1   2   300  400    33     c   7
2   3   500  700    55     e  18

将列名称sumfirst替换为''将导致

Replacing column names sum and first with '' will lead to


 row range      take2 take1 add
        min  max                
0   1   100  200    11     a   3
1   2   300  400    33     c   7
2   3   500  700    55     e  18

使用字符串格式化程序对列进行列表理解将获得所需的列名称.将其分配给df.columns将获得所需的输出.

List comprehension on columns by using string formatters will get the desired column names. Assigning it to df.columns will get the desired output.

这篇关于 pandas 数据框组:求和一列,取其他列的第一元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆