pandas Groupby Agg功能不会降低 [英] Pandas Groupby Agg Function Does Not Reduce

查看:73
本文介绍了 pandas Groupby Agg功能不会降低的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用很长时间以来在我的工作中使用的聚合函数.想法是,如果传递给函数的Series的长度为1(即组中只有一个观察值),则返回观察值.如果通过的系列的长度大于一,则观测值将以列表形式返回.

I am using an aggregation function that I have used in my work for a long time now. The idea is that if the Series passed to the function is of length 1 (i.e. the group only has one observation) then that observations is returned. If the length of the Series passed is greater than one, then the observations are returned in a list.

对于某些人来说,这似乎很奇怪,但这不是一个X,Y问题,我有充分的理由要执行与该问题无关的操作.

This may seem odd to some, but this is not an X,Y problem, I have good reason for wanting to do this that is not relevant to this question.

这是我一直在使用的功能:

This is the function that I have been using:

def MakeList(x):
    """ This function is used to aggregate data that needs to be kept distinc within multi day 
        observations for later use and transformation. It makes a list of the data and if the list is of length 1
        then there is only one line/day observation in that group so the single element of the list is returned. 
        If the list is longer than one then there are multiple line/day observations and the list itself is 
        returned."""
    L = x.tolist()
    if len(L) > 1:
        return L
    else:
        return L[0]

现在由于某种原因,使用当前正在处理的数据集,我收到ValueError声明该函数不会减少.这是一些测试数据以及我正在使用的其余步骤:

Now for some reason, with the current data set I am working on I get a ValueError stating that the function does not reduce. Here is some test data and the remaining steps I am using:

import pandas as pd
DF = pd.DataFrame({'date': ['2013-04-02',
                            '2013-04-02',
                            '2013-04-02',
                            '2013-04-02',
                            '2013-04-02',
                            '2013-04-02',
                            '2013-04-02',
                            '2013-04-02',
                            '2013-04-02',
                            '2013-04-02'],
                    'line_code':   ['401101',
                                    '401101',
                                    '401102',
                                    '401103',
                                    '401104',
                                    '401105',
                                    '401105',
                                    '401106',
                                    '401106',
                                    '401107'],
                    's.m.v.': [ 7.760,
                                25.564,
                                25.564,
                                9.550,
                                4.870,
                                7.760,
                                25.564,
                                5.282,
                                25.564,
                                5.282]})
DFGrouped = DF.groupby(['date', 'line_code'], as_index = False)
DF_Agg = DFGrouped.agg({'s.m.v.' : MakeList})

在尝试调试此命令时,我将打印语句置于print Lprint x.index的作用下, 输出如下:

In trying to debug this, I put a print statement to the effect of print L and print x.index and the output was as follows:

[7.7599999999999998, 25.564]
Int64Index([0, 1], dtype='int64')
[7.7599999999999998, 25.564]
Int64Index([0, 1], dtype='int64')

由于某些原因,似乎agg两次将Series传递给该函数.据我所知,这根本不正常,大概是我的功能没有减少的原因.

For some reason it appears that agg is passing the Series twice to the function. This as far as I know is not normal at all, and is presumably the reason why my function is not reducing.

例如,如果我编写这样的函数:

For example if I write a function like this:

def test_func(x):
    print x.index
    return x.iloc[0]

这运行没有问题,并且打印语句为:

This runs without problem and the print statements are:

DF_Agg = DFGrouped.agg({'s.m.v.' : test_func})

Int64Index([0, 1], dtype='int64')
Int64Index([2], dtype='int64')
Int64Index([3], dtype='int64')
Int64Index([4], dtype='int64')
Int64Index([5, 6], dtype='int64')
Int64Index([7, 8], dtype='int64')
Int64Index([9], dtype='int64')

这表示每个组仅作为Series一次传递给该函数.

Which indicates that each group is only being passed once as a Series to the function.

任何人都可以帮助我了解为什么会失败吗?我已经在许多处理的数据集中成功使用了此功能....

Can anyone help me understand why this is failing? I have used this function with success in many many data sets I work with....

谢谢

推荐答案

我无法真正解释您的原因,但是根据我的经验,在pandas.DataFrame中的list并不能很好地发挥作用.

I can't really explain you why, but from my experience list in pandas.DataFrame don't work all that well.

我通常改用tuple. 会起作用的:

I usually use tuple instead. That will work:

def MakeList(x):
    T = tuple(x)
    if len(T) > 1:
        return T
    else:
        return T[0]

DF_Agg = DFGrouped.agg({'s.m.v.' : MakeList})

     date line_code           s.m.v.
0  2013-04-02    401101   (7.76, 25.564)
1  2013-04-02    401102           25.564
2  2013-04-02    401103             9.55
3  2013-04-02    401104             4.87
4  2013-04-02    401105   (7.76, 25.564)
5  2013-04-02    401106  (5.282, 25.564)
6  2013-04-02    401107            5.282

这篇关于 pandas Groupby Agg功能不会降低的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆