在 pandas 中选择Multiindex列的子级别 [英] Selecting sublevels of Multiindex columns in pandas

查看:89
本文介绍了在 pandas 中选择Multiindex列的子级别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我会生成一个类似此示例的多索引数据框

I generate a multiindex dataframe like this example

import pandas as pd
import numpy as np

iterables = [ ['co1', 'co2', 'co3', 'co4'], ['age','weight'] ]
multi = pd.MultiIndex.from_product(iterables, names= ["Spread", "attribute"])

df = pd.DataFrame(np.random.rand(80).reshape(10,8),index = range(0,10), columns = multi)

每列都有一个名为权重"的子级属性

The columns each have a sublevel attribute called 'weight'

我需要生成一个列表或(最好是)一系列,对于给定的行,该列表或该系列包含该行中的所有权重"子列.在示例图片中,我想要一个能给我0.02、0.46、0.33、0.47的系列.

I need to generate a list or (preferably) Series that contains, for a given row, all the 'weight' sub-columns in that row. In the example picture, I'd want a Series that gave me 0.02, 0.46, 0.33, 0.47.

有人可以建议这样做的好方法吗?我想到的解决方案都很笼统,我怀疑我对熊猫的索引功能还不完全了解.

Can anyone suggest a nice way to do this? The solutions I've thought of are all gross, and I suspect I have an incomplete understanding of the indexing capabilities of pandas.

推荐答案

IIUC,然后您可以使用loc并传递由slice和列标签组成的元组,以访问该级别的感兴趣的col: >

IIUC then you can use loc and pass a tuple consisting of a slice and column label to access the col of interest at that level:

In [59]:
iterables = [ ['co1', 'co2', 'co3', 'co4'], ['age','weight'] ]
multi = pd.MultiIndex.from_product(iterables, names= ["Spread", "attribute"])
df = pd.DataFrame(np.random.rand(80).reshape(10,8),index = range(0,10), columns = multi)
df

Out[59]:
Spread          co1                 co2                 co3            \
attribute       age    weight       age    weight       age    weight   
0          0.600947  0.509537  0.605538  0.496002  0.215206  0.075079   
1          0.152956  0.922832  0.167788  0.024761  0.622378  0.983030   
2          0.712478  0.603798  0.407014  0.625474  0.445592  0.903240   
3          0.420569  0.576604  0.220097  0.401624  0.929464  0.512026   
4          0.273088  0.032303  0.607577  0.836231  0.751845  0.181522   
5          0.859699  0.274760  0.456812  0.666109  0.349961  0.237894   
6          0.632754  0.603252  0.157416  0.221576  0.068355  0.121864   
7          0.090595  0.035526  0.698262  0.525770  0.792618  0.220601   
8          0.670236  0.805195  0.310680  0.100464  0.875299  0.853238   
9          0.020501  0.405245  0.447614  0.999340  0.659616  0.709312   

Spread          co4            
attribute       age    weight  
0          0.297421  0.415730  
1          0.235259  0.156014  
2          0.365762  0.198299  
3          0.695431  0.478457  
4          0.331657  0.338436  
5          0.943810  0.097999  
6          0.638720  0.033747  
7          0.646969  0.475316  
8          0.623225  0.024976  
9          0.023494  0.959514  

In [61]:
df.loc[1,(slice(None),'weight')]

Out[61]:
Spread  attribute
co1     weight       0.922832
co2     weight       0.024761
co3     weight       0.983030
co4     weight       0.156014
Name: 1, dtype: float64

要解释语法:

df.loc[1,(slice(None),'weight')]

因此,第一个参数只是索引从属,第二个参数是由切片和col标签组成的元组,第一个成员为slice(None),实际上选择了所有cols'col1'至'col4',然后第二个参数在下一级选择与标签重量"匹配的列

So the first param is just your index lave, the second param is a tuple consisting of a slice and a col label, the first member being slice(None) selects all cols 'col1' to 'col4' in effect, then the second param selects at the next level cols that match the label 'weight'

这篇关于在 pandas 中选择Multiindex列的子级别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆