pandas 具有多索引的高级横截面 [英] Advanced cross-section with multi-index in pandas

查看:72
本文介绍了 pandas 具有多索引的高级横截面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框:

lb = [('A','a',1), ('A','a',2), ('A','a',3), ('A','b',1), ('A','b',2), ('A','b',3), ('B','a',1), ('B','a',2), ('B','a',3), ('B', 'b',1), ('B','b',2) ,('B','b',3)]
col = pd.MultiIndex.from_tuples(lb, names=['first','second','third'])
df = pd.DataFrame(randn(5,12), columns=col)

first          A                                                           B  \
second         a                             b                             a   
third          1         2         3         1         2         3         1   
0       1.597958  2.054695  0.449745 -0.990393  0.780978 -0.590558 -0.691706   
1      -0.093841 -1.203769  1.779555 -0.299931 -0.411360  0.122852 -0.250156   
2       0.025183  0.514480 -0.420666  1.574669  0.962010  1.278237 -0.976286   
3      -1.028288 -0.506581  0.880370  1.513487 -0.066479 -0.100231  0.785042   
4      -1.635642  0.464074 -0.335941 -0.034194  0.412519 -0.672058  0.113886   

first                                                     
second                             b                      
third          2         3         1         2         3  
0       1.954769  0.705860 -1.712058  1.015807  1.245232  
1      -2.037299 -0.120649 -0.114652 -0.686707 -0.993540  
2       0.918084 -0.892378 -0.741131 -2.547121  0.797637  
3       0.000077  2.123063  0.903571  1.972190 -1.179325  
4      -1.145241 -1.773182  0.407046 -0.301640 -0.173261  

我想获取2和3的所有列,即

I want to obtain all columns with 2 and 3, that is, sth like

df.xs([2,3], level='third', axis=1, drop_level=False)

但这不起作用.我该如何进行?

But this doesn't work. How do I proceed?

推荐答案

这是0.14.0中的新功能,请参见whatsnew

This is a new feature in 0.14.0, see whatsnew here. This effectively replaces the need for .xs.

In [8]: idx = pd.IndexSlice

In [9]: df.loc[:,idx[:,:,[2,3]]]
Out[9]: 
first          A                                       B                              
second         a                   b                   a                   b          
third          2         3         2         3         2         3         2         3
0       1.770120 -0.362269 -0.804352  1.549652  0.069858 -0.274113  0.570410 -0.460956
1      -0.982169  2.044497  0.571353  0.310634 -1.865966 -0.862613  0.124413  0.645419
2      -1.412519  0.168448  0.081467 -0.220464  1.033748  1.561429  0.094363  0.254768
3      -0.653458 -0.978661  0.158708 -0.818675 -1.122577  0.026941  2.678548  0.864817
4      -0.555179 -0.155564  1.148956  1.438523 -1.254660  0.609254 -0.970612  1.519028

要减去,这是不平凡的.

To subtract this is non-trivial.

[107]: df = pd.DataFrame(np.arange(5*12).reshape(-1,12), columns=col)

In [108]: df
Out[108]: 
first    A                       B                    
second   a           b           a           b        
third    1   2   3   1   2   3   1   2   3   1   2   3
0        0   1   2   3   4   5   6   7   8   9  10  11
1       12  13  14  15  16  17  18  19  20  21  22  23
2       24  25  26  27  28  29  30  31  32  33  34  35
3       36  37  38  39  40  41  42  43  44  45  46  47
4       48  49  50  51  52  53  54  55  56  57  58  59

Pandas希望对齐rhs一侧(毕竟,您要跟踪不同的索引), 因此您需要手动广播.这是关于此的问题: https://github.com/pydata/pandas/issues/7475

Pandas wants to align the rhs side (after all you are subtracing DIFFERENT indexes), so you need to manually broadcast this. Here is an issue about this: https://github.com/pydata/pandas/issues/7475

In [109]: df.loc[:,idx[:,:[2,3]]] = df.loc[:,idx[:,:,[2,3]]]-np.tile(df.loc[:,idx[:,:,1]].values,2)
Out[109]: 
first   A           B         
second  a     b     a     b   
third   2  3  2  3  2  3  2  3
0       1 -1 -2 -4  7  5  4  2
1       1 -1 -2 -4  7  5  4  2
2       1 -1 -2 -4  7  5  4  2
3       1 -1 -2 -4  7  5  4  2
4       1 -1 -2 -4  7  5  4  2

这篇关于 pandas 具有多索引的高级横截面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆