pandas agg和apply function有什么区别? [英] What is the difference between pandas agg and apply function?
问题描述
我无法弄清Pandas .aggregate
和.apply
函数之间的区别.
以以下示例为例:我加载数据集,执行groupby
,定义一个简单函数,
以及用户.agg
或.apply
.
I can't figure out the difference between Pandas .aggregate
and .apply
functions.
Take the following as an example: I load a dataset, do a groupby
, define a simple function,
and either user .agg
or .apply
.
如您所见,函数中的print语句产生相同的输出
使用.agg
和.apply
之后.另一方面,结果是不同的.为什么会这样?
As you may see, the printing statement within my function results in the same output
after using .agg
and .apply
. The result, on the other hand is different. Why is that?
import pandas
import pandas as pd
iris = pd.read_csv('iris.csv')
by_species = iris.groupby('Species')
def f(x):
...: print type(x)
...: print x.head(3)
...: return 1
使用apply
:
by_species.apply(f)
#<class 'pandas.core.frame.DataFrame'>
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#0 5.1 3.5 1.4 0.2 setosa
#1 4.9 3.0 1.4 0.2 setosa
#2 4.7 3.2 1.3 0.2 setosa
#<class 'pandas.core.frame.DataFrame'>
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#0 5.1 3.5 1.4 0.2 setosa
#1 4.9 3.0 1.4 0.2 setosa
#2 4.7 3.2 1.3 0.2 setosa
#<class 'pandas.core.frame.DataFrame'>
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#50 7.0 3.2 4.7 1.4 versicolor
#51 6.4 3.2 4.5 1.5 versicolor
#52 6.9 3.1 4.9 1.5 versicolor
#<class 'pandas.core.frame.DataFrame'>
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#100 6.3 3.3 6.0 2.5 virginica
#101 5.8 2.7 5.1 1.9 virginica
#102 7.1 3.0 5.9 2.1 virginica
#Out[33]:
#Species
#setosa 1
#versicolor 1
#virginica 1
#dtype: int64
使用agg
by_species.agg(f)
#<class 'pandas.core.frame.DataFrame'>
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#0 5.1 3.5 1.4 0.2 setosa
#1 4.9 3.0 1.4 0.2 setosa
#2 4.7 3.2 1.3 0.2 setosa
#<class 'pandas.core.frame.DataFrame'>
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#50 7.0 3.2 4.7 1.4 versicolor
#51 6.4 3.2 4.5 1.5 versicolor
#52 6.9 3.1 4.9 1.5 versicolor
#<class 'pandas.core.frame.DataFrame'>
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#100 6.3 3.3 6.0 2.5 virginica
#101 5.8 2.7 5.1 1.9 virginica
#102 7.1 3.0 5.9 2.1 virginica
#Out[34]:
# Sepal.Length Sepal.Width Petal.Length Petal.Width
#Species
#setosa 1 1 1 1
#versicolor 1 1 1 1
#virginica 1 1 1 1
推荐答案
apply
将该功能应用于每个组(您的Species
).您的函数将返回1,因此您最终为3组中的每组都返回1值.
apply
applies the function to each group (your Species
). Your function returns 1, so you end up with 1 value for each of 3 groups.
agg
为每个组汇总每列(功能),因此最终每组每列只有一个值.
agg
aggregates each column (feature) for each group, so you end up with one value per column per group.
请阅读 groupby
文档,它们相当乐于助人.网络上也有很多教程.
Do read the groupby
docs, they're quite helpful. There are also a bunch of tutorials floating around the web.
这篇关于pandas agg和apply function有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!