pandas - agg()函数 [英] pandas - agg() function
问题描述
d = pd.read_csv (input_file,na_values = [''])
df = pd.DataFrame(d)
df.index_col = ['name','address']
df_out = df .groupby(df.index_col).agg({'age':np.mean,'height':np.sum,'weight':np.sum})
df_out.to_csv(output_file,sep =', ')
p>
df_out = df.groupby(df.index_col)
.agg({'age':np.mean,'height ':np.sum,'weight':np.sum})[['age','height','weight']]
您也可以使用 pandas
函数:
df_out = df.groupby(df.index_col)
.agg({'age':'mean','height':sum,'weight':sum})[['age','height ','weight']]
<$ p $ {code $ d $ = $ d $'''''''''''''''''''''' ,'a','s','s'],
'age':[7,8,9,10],
'height':[1,3,5,7],
'weight':[5,3,6,8]})
print(df)
地址年龄身高体重
0 a 7 1 q 5
1 a 8 3 q 3
2 s 9 5 a 6
3 s 10 7 a 8
df.index_col = ['name','address']
df_out = df.groupby(df.index_col)
.agg({'age':'mean','height':sum,'weight':sum})[['age','height', '重量']]
印刷(df_out)
年龄身高体重
姓名地址
as 9.5 12 14
qa 7.5 4 8
code>
通过建议进行编辑 - 添加 reset_index
,这里 as_index = False
在需要索引值的情况下也不起作用:
df_out = df.groupby(df。 index_col)
.agg({'age':'mean','height':sum,'weight':sum})[['age','height','weight']]
.reset_index()
print(df_out)
名称地址年龄身高体重
0 as 9.5 12 14
1 qa 7.5 4 8
The ordering of my age, height and weight columns is changing with each run of the code. I need to keep the order of my agg columns static because I ultimately refer to this output file according to the column locations. What can I do to make sure age, height and weight are output in the same order every time?
d = pd.read_csv(input_file, na_values=[''])
df = pd.DataFrame(d)
df.index_col = ['name', 'address']
df_out = df.groupby(df.index_col).agg({'age':np.mean, 'height':np.sum, 'weight':np.sum})
df_out.to_csv(output_file, sep=',')
I think you can use subset:
df_out = df.groupby(df.index_col)
.agg({'age':np.mean, 'height':np.sum, 'weight':np.sum})[['age','height','weight']]
Also you can use pandas
functions:
df_out = df.groupby(df.index_col)
.agg({'age':'mean', 'height':sum, 'weight':sum})[['age','height','weight']]
Sample:
df = pd.DataFrame({'name':['q','q','a','a'],
'address':['a','a','s','s'],
'age':[7,8,9,10],
'height':[1,3,5,7],
'weight':[5,3,6,8]})
print (df)
address age height name weight
0 a 7 1 q 5
1 a 8 3 q 3
2 s 9 5 a 6
3 s 10 7 a 8
df.index_col = ['name', 'address']
df_out = df.groupby(df.index_col)
.agg({'age':'mean', 'height':sum, 'weight':sum})[['age','height','weight']]
print (df_out)
age height weight
name address
a s 9.5 12 14
q a 7.5 4 8
EDIT by suggestion - add reset_index
, here as_index=False
does not work if need index values too:
df_out = df.groupby(df.index_col)
.agg({'age':'mean', 'height':sum, 'weight':sum})[['age','height','weight']]
.reset_index()
print (df_out)
name address age height weight
0 a s 9.5 12 14
1 q a 7.5 4 8
这篇关于pandas - agg()函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!