为什么在 Pandas 中使用 apply 时会有一个额外的索引 [英] Why there is an extra index when using apply in Pandas

查看:29
本文介绍了为什么在 Pandas 中使用 apply 时会有一个额外的索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我对 Pandas 中的用户定义函数使用 apply 时,看起来 python 正在创建一个额外的数组.我怎么能摆脱它?这是我的代码:

When I use apply to a user defined function in Pandas, it looks like python is creating an additional array. How could I get rid of it? Here is my code:

def fnc(group):
    x = group.C.values
    out = x[np.where(x < 0)]
    return pd.DataFrame(out)

data = pd.DataFrame({'A':np.random.randint(1, 3, 10),
                     'B':3,
                     'C':np.random.normal(0, 1, 10)})

data.groupby(by=['A', 'B']).apply(fnc).reset_index()

创建了这个奇怪的 Level_2 索引.有没有办法在运行我的函数时避免创建它?

There is this weird Level_2 index created. Is there a way to avoid creating it when running my function?

    A   B   level_2   0
0   1   3   0        -1.054134802
1   1   3   1        -0.691996447
2   2   3   0        -1.068693768
3   2   3   1        -0.080342046
4   2   3   2        -0.181869799

推荐答案

因此,您将无法避免 level_2 出现.这是因为您的分组结果是一个包含多个项目的数据帧:pandas 足够酷,可以理解您的愿望是跨分组的键广播这些项目,但它将数据帧的索引作为额外的级别来保证相干输出数据.因此,预计在处理结束时明确删除 level=-1.

As such, you will have no way to avoid level_2 appearing. This is because the result of your grouping is a dataframe with several items in it: pandas is cool enough to understand your wish is to broadcast these items across the grouped keys, yet it is taking the index of the dataframe as an additional level to guarantee coherent output data. So dropping level=-1 at the end of your processing explicitly is expected.

如果你想避免重置那个额外的索引,但仍然有一些后期处理,另一种方法是调用transform而不是apply,并从作为整个组的fnc获取返回的数据向量,您将 np.nan 放在其中以排除结果.然后,您的数据框将不会有额外的级别,但您需要在之后调用 dropna().

If you want to avoid to reset that extra index, but still have some post processing, another way would be to call transform instead of apply, and get the returned data from fnc being the entire group vector where you put np.nan for results to exclude. Then, your dataframe will not have an extra level, but you'll need to call dropna() afterwards.

这篇关于为什么在 Pandas 中使用 apply 时会有一个额外的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆