apply()和aggregate()函数之间的 pandas 区别 [英] Pandas difference between apply() and aggregate() functions

查看:341
本文介绍了apply()和aggregate()函数之间的 pandas 区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我只是传递

func=lambda x: x**2

因为返回值似乎完全相同.而且文档只告诉您:

because the return values seems to be pretty the same. And the documentation only tells:

apply()->已应用:系列或数据框

apply() --> applied : Series or DataFrame

aggregate()->聚合:DataFrame

aggregate() --> aggregated : DataFrame

推荐答案

有两个版本的agg(聚合的缩写)和apply:第一个版本在groupby对象上定义,第二个版本在DataFrames上定义.

There are two versions of agg (short for aggregate) and apply: The first is defined on groupby objects and the second one is defined on DataFrames.

如果考虑使用groupby.agggroupby.apply,则主要区别在于申请的灵活性(

If you consider groupby.agg and groupby.apply, the main difference would be that the apply is flexible (docs):

对分组数据进行的某些操作可能不适合 汇总或转换类别.或者,您可能只是希望GroupBy 推断如何合并结果.对于这些,请使用apply函数, 在许多情况下都可以代替聚合和变换 标准用例.

Some operations on the grouped data might not fit into either the aggregate or transform categories. Or, you may simply want GroupBy to infer how to combine the results. For these, use the apply function, which can be substituted for both aggregate and transform in many standard use cases.

注意:apply可以充当减速器,变压器或滤波器功能, 取决于具体通过什么申请.因此,取决于路径 以及您要分组的内容.因此,分组的列 可能会包含在输出中并设置索引.

Note: apply can act as a reducer, transformer, or filter function, depending on exactly what is passed to apply. So depending on the path taken, and exactly what you are grouping. Thus the grouped columns(s) may be included in the output as well as set the indices.

请参见 Python熊猫:如何将列中的分组列表作为字典返回,例如说明如何自动更改返回类型.

See Python Pandas : How to return grouped lists in a column as a dict for example for an illustration of how the returning type is automatically changed.

groupby.agg非常适合应用cython优化函数(即能够非常快速地计算'sum''mean''std'等).它还允许在不同的列上计算多个(不同的)函数.例如,

groupby.agg, on the other hand, is very good for applying cython optimized functions (i.e. being able to calculate 'sum', 'mean', 'std' etc. very fast). It also allows calculating multiple (different) functions on different columns. For example,

df.groupby('some_column').agg({'first_column': ['mean', 'std'],
                               'second_column': ['sum', 'sem']}

在第一列中计算平均值和标准偏差,在第二列中计算平均值的总和和标准误.有关更多示例,请参见 dplyr汇总为熊猫.

calculates the mean and the standard deviation on the first column and sum and standard error of the mean on the second column. See dplyr summarize equivalent in pandas for more examples.

这些差异也总结在什么是熊猫agg和应用功能之间的区别?但是那一集中在groupby.agggroupby.apply之间的区别上.

These differences are also summarized in What is the difference between pandas agg and apply function? But that one focuses on the differences between groupby.agg and groupby.apply.

DataFrame.agg是0.20版中的新功能.早些时候,我们无法将多个不同的功能应用于不同的列,因为只有groupby对象才有可能.现在,您可以通过在DataFrame的列上计算多个不同的函数来对其进行汇总. 中的示例是否有与dplyr :: summarise等效的熊猫?:

DataFrame.agg is new in version 0.20. Earlier, we weren't able to apply multiple different functions to different columns because it was only possible with groupby objects. Now, you can summarize a DataFrame by calculating multiple different functions on its columns. Example from Is there a pandas equivalent of dplyr::summarise?:

iris.agg({'sepal_width': 'min', 'petal_width': 'max'})

petal_width    2.5
sepal_width    2.0
dtype: float64

iris.agg({'sepal_width': ['min', 'median'], 'sepal_length': ['min', 'mean']})

        sepal_length  sepal_width
mean        5.843333          NaN
median           NaN          3.0
min         4.300000          2.0

这对于DataFrame.apply是不可能的.它可以逐列或逐行执行,并在该列/行上执行相同的功能.对于lambda x: x**2之类的单个函数,它们产生的结果相同,但其预期用途却大不相同.

This is not possible with DataFrame.apply. It either goes column by column or row by row and executes the same function on that column/row. For a single function like lambda x: x**2 they produce the same results but their intended usage is very different.

这篇关于apply()和aggregate()函数之间的 pandas 区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆