pandas 数据框组:求和一列,取其他列的第一元素 [英] Pandas dataframe group: sum one column, take first element from others
问题描述
我有一个熊猫数据框
x = pd.DataFrame.from_dict({'row':[1, 1, 2, 2, 3, 3, 3], 'add': [1, 2, 3, 4, 5, 6, 7], 'take1': ['a', 'b', 'c', 'd', 'e', 'f', 'g'], 'take2': ['11', '22', '33', '44', '55', '66', '77'], 'range': [100, 200, 300, 400, 500, 600, 700]})
add range row take1 take2
0 1 100 1 a 11
1 2 200 1 b 22
2 3 300 2 c 33
3 4 400 2 d 44
4 5 500 3 e 55
5 6 600 3 f 66
6 7 700 3 g 77
我想按row
列将其分组,然后在add
列中添加条目,但是从take1
和take2
中获得第一个条目,并从范围中选择最小值和最大值:>
I want to group it by the row
column, then add up entries in add
column, but take the first entry from take1
and take2
, and select the min and max from range:
add row take1 take2 min_range max_range
0 3 1 a 11 100 200
1 7 2 c 33 300 400
2 18 3 e 55 500 700
推荐答案
Use DataFrameGroupBy.agg
by dict, but then some cleaning is necessary, because get MultiIndex
in columns:
#create a dictionary of column names and functions to apply to that column
d = {'add':'sum', 'take1':'first', 'take2':'first', 'range':['min','max']}
#group by the row column and apply the corresponding aggregation to each
#column as specified in the dictionary d
df = x.groupby('row', as_index=False).agg(d)
#rename some columns
df = df.rename(columns={'first':'', 'sum':''})
df.columns = ['{0[0]}_{0[1]}'.format(x).strip('_') for x in df.columns]
print (df)
row take1 range_min range_max take2 add
0 1 a 100 200 11 3
1 2 c 300 400 33 7
2 3 e 500 700 55 18
详细信息:根据字典中指定的功能聚合列:
Details : Aggregate the columns based by the functions specified in the dictionary :
df = x.groupby('row', as_index=False).agg(d)
row range take2 take1 add
min max first first sum
0 1 100 200 11 a 3
1 2 300 400 33 c 7
2 3 500 700 55 e 18
将列名称sum
和first
替换为''
将导致
Replacing column names sum
and first
with ''
will lead to
row range take2 take1 add
min max
0 1 100 200 11 a 3
1 2 300 400 33 c 7
2 3 500 700 55 e 18
使用字符串格式化程序对列进行列表理解将获得所需的列名称.将其分配给df.columns
将获得所需的输出.
List comprehension on columns by using string formatters will get the desired column names. Assigning it to df.columns
will get the desired output.
这篇关于 pandas 数据框组:求和一列,取其他列的第一元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!