pandas 中的多索引选择 [英] multiindex selecting in pandas

查看:33
本文介绍了 pandas 中的多索引选择的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在理解 Pandas 中的多索引选择时遇到问题.

I have problems understanding multiindex selecting in pandas.

                    0  1  2  3
first second third            
C     one    mean   3  4  2  7
             std    4  1  7  7
      two    mean   3  1  4  7
             std    5  6  7  0
      three  mean   7  0  2  5
             std    7  3  7  1
H     one    mean   2  4  3  3
             std    5  5  3  5
      two    mean   5  7  0  6
             std    0  1  0  2
      three  mean   5  2  5  1
             std    9  0  4  6
V     one    mean   3  7  3  9
             std    8  7  9  3
      two    mean   1  9  9  0
             std    1  1  5  1
      three  mean   3  1  0  6
             std    6  2  7  4

我需要创建新行:

- 'CH' : ['CH',:,'mean'] => ['C',:,'mean'] - ['H',:,'mean']
- 'CH' : ['CH',:,'std'] => (['C',:,'std']**2 + ['H',:,'std']**2)**.5

尝试选择行时,我收到不同类型的错误:UnsortedIndexError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (3), lexsort depth (1)'

When trying to select rows I get different types of errors: UnsortedIndexError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (3), lexsort depth (1)'

这个操作应该怎么做?

import pandas as pd
import numpy as np
iterables = [['C', 'H', 'V'],
          ['one','two','three'],
          ['mean','std']]
midx = pd.MultiIndex.from_product(iterables, names=['first', 'second','third'])
chv = pd.DataFrame(np.random.randint(0,high=10,size=(18,4)), index=midx)
print (chv)
idx = pd.IndexSlice
chv.loc[:,idx['C',:,'mean']]

推荐答案

您可以通过 首先切片器,然后rename 第一级并使用算术运算,最后concat 在一起:

You can filter by slicers first, then rename first level and use arithmetic operations, last concat together:

#avoid UnsortedIndexError
df = df.sort_index()

idx = pd.IndexSlice
c1 = chv.loc[idx['C',:,'mean'], :].rename({'C':'CH'}, level=0)
h1 = chv.loc[idx['H',:,'mean'], :].rename({'H':'CH'}, level=0)
ch1 = c1 - h1

c2 = chv.loc[idx['C',:,'std'], :].rename({'C':'CH'}, level=0)**2
h2 = chv.loc[idx['H',:,'std'], :].rename({'H':'CH'}, level=0)**2
ch2 = (c2 + h2)**.5

df = pd.concat([chv, ch1, ch2]).sort_index()

<小时>

print (df)
                           0         1         2         3
first second third                                        
C     one    mean   7.000000  5.000000  8.000000  3.000000
             std    0.000000  4.000000  4.000000  4.000000
      three  mean   4.000000  2.000000  1.000000  6.000000
             std    8.000000  7.000000  3.000000  3.000000
      two    mean   1.000000  8.000000  2.000000  5.000000
             std    2.000000  2.000000  4.000000  2.000000
CH    one    mean   1.000000  2.000000  1.000000  2.000000
             std    4.000000  7.211103  4.000000  7.211103
      three  mean   1.000000  0.000000 -4.000000  2.000000
             std    8.062258  7.071068  4.242641  3.000000
      two    mean  -1.000000  6.000000 -2.000000  3.000000
             std    9.219544  7.280110  4.123106  2.000000
H     one    mean   6.000000  3.000000  7.000000  1.000000
             std    4.000000  6.000000  0.000000  6.000000
      three  mean   3.000000  2.000000  5.000000  4.000000
             std    1.000000  1.000000  3.000000  0.000000
      two    mean   2.000000  2.000000  4.000000  2.000000
             std    9.000000  7.000000  1.000000  0.000000
V     one    mean   9.000000  5.000000  0.000000  5.000000
             std    7.000000  9.000000  1.000000  1.000000
      three  mean   3.000000  0.000000  3.000000  4.000000
             std    1.000000  4.000000  9.000000  2.000000
      two    mean   3.000000  6.000000  3.000000  2.000000
             std    1.000000  3.000000  1.000000  4.000000

这篇关于 pandas 中的多索引选择的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆