pandas -DataFrame聚合行为异常 [英] Pandas - DataFrame aggregate behaving oddly
问题描述
与数据框聚合方法传递列表问题和文档对于aggregate
,您应该能够使用dict
这样指定要聚合的列:
df.agg({'a' : 'mean'})
返回哪个
a 13.5
但是,如果您尝试使用这样的用户定义功能aggregate
def nok_mean(x):
return np.mean(x)
df.agg({'a' : nok_mean})
它返回每一行而不是每一列的平均值
a
0 0.0
1 3.0
2 6.0
3 9.0
4 12.0
5 15.0
6 18.0
7 21.0
8 24.0
9 27.0
为什么用户定义的函数返回的值与使用np.mean
或'mean'
进行聚合的返回值不同?
这正在使用pandas
版本0.23.4
,numpy
版本1.15.4
,python
版本3.7.1
问题与将np.mean
应用于系列有关.让我们看几个例子:
def nok_mean(x):
return x.mean()
df.agg({'a': nok_mean})
a 13.5
dtype: float64
这可以按预期工作,因为您使用的是平均值的熊猫版本,可以将其应用于序列或数据框:
df['a'].agg(nok_mean)
df.apply(nok_mean)
让我们看看将np.mean
应用于系列时会发生什么:
def nok_mean1(x):
return np.mean(x)
df['a'].agg(nok_mean1)
df.agg({'a':nok_mean1})
df['a'].apply(nok_mean1)
df['a'].apply(np.mean)
全部返回
0 0.0
1 3.0
2 6.0
3 9.0
4 12.0
5 15.0
6 18.0
7 21.0
8 24.0
9 27.0
Name: a, dtype: float64
将np.mean
应用于数据框时,它会按预期工作:
df.agg(nok_mean1)
df.apply(nok_mean1)
a 13.5
b -8.0
dtype: float64
为了使np.mean
正常工作,请为x传递一个ndarray:
def nok_mean2(x):
return np.mean(x.values)
df.agg({'a':nok_mean2})
a 13.5
dtype: float64
我想所有这些都与apply
有关,这就是为什么df['a'].apply(nok_mean2)
返回属性错误的原因.
我正在猜测在源代码中的此处
Related to Dataframe aggregate method passing list problem and Pandas fails to aggregate with a list of aggregation functions
Consider this dataframe
import pandas as pd
import numpy as np
df = pd.DataFrame(index=range(10))
df['a'] = [ 3 * x for x in range(10) ]
df['b'] = [ 1 -2 * x for x in range(10) ]
According to the documentation for aggregate
you should be able to specify which columns to aggregate using a dict
like this:
df.agg({'a' : 'mean'})
Which returns
a 13.5
But if you try to aggregate
with a user-defined function like this one
def nok_mean(x):
return np.mean(x)
df.agg({'a' : nok_mean})
It returns the mean for each row rather than the column
a
0 0.0
1 3.0
2 6.0
3 9.0
4 12.0
5 15.0
6 18.0
7 21.0
8 24.0
9 27.0
Why does the user-defined function not return the same as aggregating with np.mean
or 'mean'
?
This is using pandas
version 0.23.4
, numpy
version 1.15.4
, python
version 3.7.1
The issue has to do with applying np.mean
to a series. Let's look at a few examples:
def nok_mean(x):
return x.mean()
df.agg({'a': nok_mean})
a 13.5
dtype: float64
this works as expected because you are using pandas version of mean, which can be applied to a series or a dataframe:
df['a'].agg(nok_mean)
df.apply(nok_mean)
Let's see what happens when np.mean
is applied to a series:
def nok_mean1(x):
return np.mean(x)
df['a'].agg(nok_mean1)
df.agg({'a':nok_mean1})
df['a'].apply(nok_mean1)
df['a'].apply(np.mean)
all return
0 0.0
1 3.0
2 6.0
3 9.0
4 12.0
5 15.0
6 18.0
7 21.0
8 24.0
9 27.0
Name: a, dtype: float64
when you apply np.mean
to a dataframe it works as expected:
df.agg(nok_mean1)
df.apply(nok_mean1)
a 13.5
b -8.0
dtype: float64
in order to get np.mean
to work as expected with a function pass an ndarray for x:
def nok_mean2(x):
return np.mean(x.values)
df.agg({'a':nok_mean2})
a 13.5
dtype: float64
I am guessing all of this has to do with apply
, which is why df['a'].apply(nok_mean2)
returns an attribute error.
I am guessing here in the source code
这篇关于 pandas -DataFrame聚合行为异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!