需要帮助,仅将一列分组即可将pandas数据框转换为multiindex。 [英] Need help turning pandas dataframe into multiindex by grouping just one column.

查看:43
本文介绍了需要帮助,仅将一列分组即可将pandas数据框转换为multiindex。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据框 df 看起来像这样:

I have a pandas dataframe df that looks like this:

>>>df
group A B C
1     1 2 3
1     2 3 6
1     4 9 9
2     8 1 2
2     5 6 4
3     6 5 7

我希望它具有多索引功能

I would like it multi-indexed so it looks like

group 
      A B C
1     1 2 3
      2 3 6
      4 9 9
2     8 1 2
      5 6 4
3     6 5 7

我想访问每个组号,使我得到一个仅包含该组索引值的数据框。我的意思是,如果我键入 df [0] ,那么我会得到

I'd like to access each group number gives me a dataframe of just the values for that group index. What I mean is if I type df[0] then I get

>>>df[0]
A B C
1 2 3
2 3 6
4 9 9

,我可以执行通常的功能,例如通过 df [0] .mean()取平均值

and I can do the usual functions, like take the mean via df[0].mean()

我敢肯定,这是可能的,但是阅读熊猫帮助页面并浏览论坛似乎为已经创建具有元组的多索引数据框的人提供了解决方案。

I'm sure this is possible but reading the pandas help pages and looking through forums seems to have solutions for people who already created multi-indexed dataframes with tuples.

推荐答案

set_index 将为您做到这一点。

df = df.set_index('group').set_index(
    df.groupby('group').cumcount(), append=True
)

df
         A  B  C
group           
1     0  1  2  3
      1  2  3  6
      2  4  9  9
2     0  8  1  2
      1  5  6  4
3     0  6  5  7

或者,创建一个 MultiIndex 对象并分配给 df.index 。就内存而言,这要高效得多。

Alternatively, create a MultiIndex object and assign to df.index. This is a lot more efficient in terms of memory.

i = df['group']
j = df.groupby(df.pop('group')).cumcount()

df.index = pd.MultiIndex.from_arrays([i, j])

现在,

df.xs(1)

   A  B  C
0  1  2  3
1  2  3  6
2  4  9  9

就像那样™。

如果您不喜欢最后的 xs ,当然可以选择将您的DataFrame分成几组,然后将每一个都转储到字典中。

If you don't fancy the xs at the end, there's certainly the option of splitting your DataFrame into groups and dumping each one into a dictionary.

groupby API编写成模仿 itertools.groupby 字典式成语的样子,如下所示:

The groupby API has been written to mimic the itertools.groupby dict-like idiom, here's what that looks like:

df_dict = {k : g for k, g in df.drop('group', 1).groupby(df.group)}
df_dict[1]

   A  B  C
0  1  2  3
1  2  3  6
2  4  9  9

请注意,这不再是单个DataFrame,而是它们的字典

Note that this is no longer a single DataFrame, but a dictionary of them.

这篇关于需要帮助,仅将一列分组即可将pandas数据框转换为multiindex。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆