Python/Pandas - 使用第一个/最后一个函数聚合数据帧而不分组 [英] Python/Pandas - Aggregating dataframe with first/last function without grouping
问题描述
我正在尝试使用 Pandas 聚合整个数据框,而不按任何内容分组.
I am trying to aggregate an entire dataframe using pandas, without grouping by anything.
我确实需要为不同的列使用不同的函数,所以我使用的是字典,但是将first"或last"作为聚合函数传递会引发 ValueError: no results,而其他的,例如 'min'/'max'/'mean' 没问题.
I do need different functions for different columns so I'm using a dictionary, however passing 'first' or 'last' as aggregation functions throws a ValueError: no results, while others such as 'min'/'max'/'mean' give no problem.
这是代码的简化.
df = pd.DataFrame({'Col1':[1,2,3,4], 'Col2':[5,6,7,8], 'Col3':[9,10,11,12]})
func = {col: ['first', 'last'] if col in ['Col1']
else ['first', 'last', 'mean'] if col in ['Col2']
else 'mean' for col in df.columns}
result = df.agg(func)
使用
result = df.groupby(lambda _ : True).agg(func)
完成这项工作但速度很慢,我认为是由于 groupby.该数据帧已经是无法进一步分组的更大数据帧的子集.
does the job but is quite slow, I assume due to the groupby. The dataframe is already a subset of a larger dataframe that cannot be further grouped.
我有数百个列,我无法单独聚合它们.
I have hundreds of columns, I cannot aggregate them individually.
是否有另一种方法可以比分组更快/更有效地获取第一行和最后一行以及不同的聚合?
Is there another way to obtain the first and last row, as well as different aggregations, in a faster/more efficient way than grouping?
对于这样的示例数据帧
Col1 Col2 Col3
0 1 5 9
1 2 6 10
2 3 7 11
3 4 8 12
输出应该是
Col1 Col2 Col3
first last first last mean mean
True 1 4 5 8 6.5 10.5
正如原始 groupby 函数所做的那样,不应删除空值/列.
As the original groupby functions would do, no null values/columns should be removed.
推荐答案
更新:
df = pd.DataFrame({'Col1':[1,2,3,4], 'Col2':[5,6,7,8], 'Col3':[9,10,11,12]})
group_1 = ['Col1']
group_2 = ['col2']
func = {col:[fvalue, lvalue] if col in group_1
else [fvalue, lvalue, 'mean'] if col in group_2
else 'mean' for col in df.columns}
df.agg(func).unstack().to_frame().dropna().T
输出:
Col1 Col2 Col3
fvalue lvalue mean mean
0 1.0 4.0 6.5 10.5
让我们看看在不使用 groupby 的情况下使用自定义函数是否会有所帮助:
Let's see if using custom functions without using groupby will help things a little:
def fvalue(x):
return x.iloc[0]
def lvalue(x):
return x.iloc[-1]
func = {col:[fvalue, lvalue] if col in group_1
else [fvalue, lvalue, 'mean'] if col in group_2
else 'mean' for col in df.columns}
df.agg(func)
这篇关于Python/Pandas - 使用第一个/最后一个函数聚合数据帧而不分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!