pandas -DataFrame聚合行为异常 [英] Pandas - DataFrame aggregate behaving oddly

查看:104
本文介绍了 pandas -DataFrame聚合行为异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

数据框聚合方法传递列表问题文档对于aggregate,您应该能够使用dict这样指定要聚合的列:

df.agg({'a' : 'mean'})

返回哪个

a    13.5

但是,如果您尝试使用这样的用户定义功能aggregate

def nok_mean(x):
    return np.mean(x)

df.agg({'a' : nok_mean})

它返回每一行而不是每一列的平均值

      a
0   0.0
1   3.0
2   6.0
3   9.0
4  12.0
5  15.0
6  18.0
7  21.0
8  24.0
9  27.0

为什么用户定义的函数返回的值与使用np.mean'mean'进行聚合的返回值不同?

这正在使用pandas版本0.23.4numpy版本1.15.4python版本3.7.1

解决方案

问题与将np.mean应用于系列有关.让我们看几个例子:

def nok_mean(x):
    return x.mean()

df.agg({'a': nok_mean})

a    13.5
dtype: float64

这可以按预期工作,因为您使用的是平均值的熊猫版本,可以将其应用于序列或数据框:

df['a'].agg(nok_mean)
df.apply(nok_mean)

让我们看看将np.mean应用于系列时会发生什么:

def nok_mean1(x):
    return np.mean(x)

df['a'].agg(nok_mean1)
df.agg({'a':nok_mean1})
df['a'].apply(nok_mean1)
df['a'].apply(np.mean)

全部返回

0     0.0
1     3.0
2     6.0
3     9.0
4    12.0
5    15.0
6    18.0
7    21.0
8    24.0
9    27.0
Name: a, dtype: float64

np.mean应用于数据框时,它会按预期工作:

df.agg(nok_mean1)
df.apply(nok_mean1)

a    13.5
b    -8.0
dtype: float64

为了使np.mean正常工作,请为x传递一个ndarray:

def nok_mean2(x):
    return np.mean(x.values)

df.agg({'a':nok_mean2})

a    13.5
dtype: float64

我想所有这些都与apply有关,这就是为什么df['a'].apply(nok_mean2)返回属性错误的原因.

我正在猜测在源代码中的此处

Related to Dataframe aggregate method passing list problem and Pandas fails to aggregate with a list of aggregation functions

Consider this dataframe

import pandas as pd
import numpy as np
df = pd.DataFrame(index=range(10))
df['a'] = [ 3 * x for x in range(10) ]
df['b'] = [ 1 -2 * x for x in range(10) ]

According to the documentation for aggregate you should be able to specify which columns to aggregate using a dict like this:

df.agg({'a' : 'mean'})

Which returns

a    13.5

But if you try to aggregate with a user-defined function like this one

def nok_mean(x):
    return np.mean(x)

df.agg({'a' : nok_mean})

It returns the mean for each row rather than the column

      a
0   0.0
1   3.0
2   6.0
3   9.0
4  12.0
5  15.0
6  18.0
7  21.0
8  24.0
9  27.0

Why does the user-defined function not return the same as aggregating with np.mean or 'mean'?

This is using pandas version 0.23.4, numpy version 1.15.4, python version 3.7.1

解决方案

The issue has to do with applying np.mean to a series. Let's look at a few examples:

def nok_mean(x):
    return x.mean()

df.agg({'a': nok_mean})

a    13.5
dtype: float64

this works as expected because you are using pandas version of mean, which can be applied to a series or a dataframe:

df['a'].agg(nok_mean)
df.apply(nok_mean)

Let's see what happens when np.mean is applied to a series:

def nok_mean1(x):
    return np.mean(x)

df['a'].agg(nok_mean1)
df.agg({'a':nok_mean1})
df['a'].apply(nok_mean1)
df['a'].apply(np.mean)

all return

0     0.0
1     3.0
2     6.0
3     9.0
4    12.0
5    15.0
6    18.0
7    21.0
8    24.0
9    27.0
Name: a, dtype: float64

when you apply np.mean to a dataframe it works as expected:

df.agg(nok_mean1)
df.apply(nok_mean1)

a    13.5
b    -8.0
dtype: float64

in order to get np.mean to work as expected with a function pass an ndarray for x:

def nok_mean2(x):
    return np.mean(x.values)

df.agg({'a':nok_mean2})

a    13.5
dtype: float64

I am guessing all of this has to do with apply, which is why df['a'].apply(nok_mean2) returns an attribute error.

I am guessing here in the source code

这篇关于 pandas -DataFrame聚合行为异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆