将函数应用于分组的DataFrame后,Pandas sort_index给出奇怪的结果 [英] Pandas sort_index gives strange result after applying function to grouped DataFrame

查看:59
本文介绍了将函数应用于分组的DataFrame后,Pandas sort_index给出奇怪的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基本设置:

我在行和列上都有一个DataFrame和一个MultiIndex.列索引的第二级具有float s值.

I have a DataFrame with a MultiIndex on both the rows and the columns. The second level of the column index has floats for values.

我想执行groupby操作(按行索引的第一级分组).该操作将为每个组添加几列(也将float作为其标签),然后返回该组.

I want to perform a groupby operation (grouping by the first level of the row index). The operation will add a few columns (also with floats as their labels) to each group and then return the group.

当我从groupby操作中获得结果时,似乎无法正确地对列进行排序.

When I get the result back from my groupby operation, I can't seem to get the columns to sort properly.

工作示例.首先,进行设置:

Working example. First, set things up:

import pandas as pd
import numpy as np

np.random.seed(0)

col_level_1 = ['red', 'blue']
col_level_2 = [1., 2., 3., 4.]

row_level_1 = ['a', 'b']
row_level_2 = ['one', 'two']

col_idx = pd.MultiIndex.from_product([col_level_1, col_level_2], names=['color', 'numeral'])
row_idx = pd.MultiIndex.from_product([row_level_1, row_level_2], names=['letter', 'number'])

df = pd.DataFrame(np.random.randn(len(row_idx), len(col_idx)), index=row_idx, columns=col_idx)

df中给出此DataFrame:

然后定义我的群组操作并应用它:

Then define my group operation and apply it:

def mygrpfun(group):
    for f in [1.5, 2.5, 3.5]:
        group[('red', f)] = 'hello'
        group[('blue', f)] = 'world'
    return group

result = df.groupby(level='letter').apply(mygrpfun).sort_index(axis=1)

显示result给出:

这是怎么回事?为什么列索引的第二级不按升序显示?

What's going on here? Why doesn't the 2nd level of the column index display in ascending order?

就上下文而言:

pd.__version__
Out[28]:
'0.14.0'
In [29]:

np.__version__
Out[29]:
'1.8.1'

非常感谢任何帮助.

推荐答案

返回的结果与预期的一样.您添加了列.无法保证对这些列强加了顺序.

The returned result looks as expected. You added columns. There was no guarantee that order imposed on those columns.

您可以重新订购:

result = result[sorted(result.columns)]

这篇关于将函数应用于分组的DataFrame后,Pandas sort_index给出奇怪的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆