带有逻辑 pandas 的多重索引和蒙版 [英] Multi Indexing and masks with logic pandas

查看:66
本文介绍了带有逻辑 pandas 的多重索引和蒙版的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有4个索引.门,洛克,地理位置和街区.而且我需要创建遮罩以对其进行操作,以便可以创建遮罩并执行如下所示的操作:

I have 4 indexes. Mun, loc, geo and block. And I need to create masks to operate with them so I can create masks and perform operations that will look like this:

                       data1  data2
mun  loc  geo  block
0    0    0    0       12     12
1    0    0    0       20     20
1    1    0    0       10     10
1    1    1    0       10     10   
1    1    1    1       3      3/4
1    1    1    2       4      4/4
1    1    2    0       30     30   
1    1    2    1       1      1/3
1    1    2    2       3      3/3
1    1    0    0       4      4
1    2    1    1       10     10/12
1    2    1    2       12     12/12
2    0    0    0       60     60
2    1    1    1       123    123/123
2    1    1    2       7      7/123
2    1    2    1       6      6/6
2    1    2    2       1      1/6

                       data1  data2
mun  loc  geo  block
0    0    0    0       12     12
1    0    0    0       20     20
1    1    0    0       10     10
1    1    1    0       10     10/30   
1    1    1    1       4      4
1    1    2    0       30     30/30   
1    2    1    0       2      2/3
1    2    2    0       3      3/3
1    2    3    0       1      1/3
2    0    0    0       60     60
2    1    1    0       12     12/88 
2    1    1    1       1       1
2    1    2    0       88     88/88
2    1    2    1       9      9

                       data1  data2
mun  loc  geo  block
0    0    0    0       14     14
1    0    0    0       12     12
1    1    0    0       20     20/20
1    1    1    0       10     10   
1    1    1    1       31     31
1    2    0    0       15     15/20 
1    2    1    1       11     11
2    0    0    0       80     80
2    1    0    0       100    100/100
2    1    1    2       7      7
2    2    0    0       11     11/100

                       data1  data2
mun  loc  geo  block
0    0    0    0       55     55
1    0    0    0       70     70/70
1    1    0    0       12     12
1    1    1    0       13     13   
2    0    0    0       60     60/70
2    1    1    1       12     12
2    1    2    1       6      6
3    0    0    0       12     12/70

也就是说,将最大值放在层次结构内,然后将每个元素除以它.我在有关第一个问题的另一个问题中得到了帮助,但是在掌握多重索引方面我遇到了很多问题.任何帮助我都会感激.

That is, take the max value inside the hierarchy and divide each element by it. I got help in another question regarding the first problem, but I'm having a lot of problems getting grasp of multi index. Any help will me appreciated.

推荐答案

这并不容易.但主要使用 get_level_values 选择条件的值:

It was not easy. But mainly use get_level_values for select values for condition:

级别 阻止 :

Level block:

print (df)
                   data1    data2
mun loc geo block                
0   0   0   0         12       12
1   0   0   0         20       20
    1   0   0         10       10
        1   0         10       10
            1          3      3/4
            2          4      4/4
        2   0         30       30
            1          1      1/3
            2          3      3/3
        0   0          4        4
    2   1   1         10    10/12
            2         12    12/12
2   0   0   0         60       60
    1   1   1        123  123/123
            2          7    7/123
        2   1          6      6/6
            2          1      1/6

mask3 =  (df.index.get_level_values('mun') != 0) & \
         (df.index.get_level_values('loc') != 0 ) & \
         (df.index.get_level_values('geo') != 0) & \
         (df.index.get_level_values('block') != 0 )

print (mask3)
[False False False False  True  True False  True  True False  True  True
 False  True  True  True  True]

df2 = df.ix[mask3, 'data1'].groupby(level=['mun','loc','geo']).max()
#print (df2)

df2 = df2.reindex(df.reset_index(level=3, drop=True).index).mask(~mask3).fillna(1)
#print (df2)

print (df['data1'].div(df2.values,axis=0))
mun  loc  geo  block
0    0    0    0        12.000000
1    0    0    0        20.000000
     1    0    0        10.000000
          1    0        10.000000
               1         0.750000
               2         1.000000
          2    0        30.000000
               1         0.333333
               2         1.000000
          0    0         4.000000
     2    1    1         0.833333
               2         1.000000
2    0    0    0        60.000000
     1    1    1         1.000000
               2         0.056911
          2    1         1.000000
               2         0.166667
dtype: float64


级别 geo :


Level geo:

print (df)
                   data1  data2
mun loc geo block              
0   0   0   0         12     12
1   0   0   0         20     20
    1   0   0         10     10
        1   0         10  10/30
            1          4      4
        2   0         30  30/30
    2   1   0          2    2/3
        2   0          3    3/3
        3   0          1    1/3
2   0   0   0         60     60
    1   1   0         12  12/88
            1          1      1
        2   0         88  88/88
            1          9      9

df1 = df.reset_index(drop=True, level='block')

mask3 =  (df.index.get_level_values('mun') != 0) & \
             (df.index.get_level_values('loc') != 0 ) & \
             (df.index.get_level_values('geo') != 0) & \
             (df.index.get_level_values('block') == 0 )

print (mask3)
[False False False  True False  True  True  True  True False  True False
  True False]

df2 = df1.ix[mask3, 'data1'].groupby(level=['mun','loc']).max()

df2=df2.reindex(df.reset_index(level=['geo','block'], drop=True).index).mask(~mask3).fillna(1)
print (df2)
df['new'] = df['data1'].div(df2.values,axis=0)

print (df)
                   data1  data2        new
mun loc geo block                         
0   0   0   0         12     12  12.000000
1   0   0   0         20     20  20.000000
    1   0   0         10     10  10.000000
        1   0         10  10/30   0.333333
            1          4      4   4.000000
        2   0         30  30/30   1.000000
    2   1   0          2    2/3   0.666667
        2   0          3    3/3   1.000000
        3   0          1    1/3   0.333333
2   0   0   0         60     60  60.000000
    1   1   0         12  12/88   0.136364
            1          1      1   1.000000
        2   0         88  88/88   1.000000
            1          9      9   9.000000


级别 loc :


Level loc:

print (df)
                   data1    data2
mun loc geo block                
0   0   0   0         14       14
1   0   0   0         12       12
    1   0   0         20    20/20
        1   0         10       10
            1         31       31
    2   0   0         15    15/20
        1   1         11       11
2   0   0   0         80       80
    1   0   0        100  100/100
        1   2          7        7
    2   0   0         11   11/100

df1 = df.reset_index(drop=True, level=['block', 'geo'])


mask3 =  (df.index.get_level_values('mun') != 0) & \
         (df.index.get_level_values('loc') != 0 ) & \
         (df.index.get_level_values('geo') == 0) & \
         (df.index.get_level_values('block') == 0 )

print (mask3)
[False False  True False False  True False False  True False  True]

df2 = df1.ix[mask3, 'data1'].groupby(level=['mun']).max()
#print (df2)

df2 =df2.reindex(df.reset_index(level=['geo','block', 'loc'], drop=True).index).mask(~mask3).fillna(1)
#print (df2)

print (df['data1'].div(df2.values,axis=0))
mun  loc  geo  block
0    0    0    0        14.00
1    0    0    0        12.00
     1    0    0         1.00
          1    0        10.00
               1        31.00
     2    0    0         0.75
          1    1        11.00
2    0    0    0        80.00
     1    0    0         1.00
          1    2         7.00
     2    0    0         0.11
dtype: float64


级别 mun :


Level mun:

print (df)
                   data1  data2
mun loc geo block              
0   0   0   0         55     55
1   0   0   0         70  70/70
    1   0   0         12     12
        1   0         13     13
2   0   0   0         60  60/70
    1   1   1         12     12
        2   1          6      6
3   0   0   0         12  12/70

mask3 =  (df.index.get_level_values('mun') != 0) & \
         (df.index.get_level_values('loc') == 0 ) & \
         (df.index.get_level_values('geo') == 0) & \
         (df.index.get_level_values('block') == 0 )

print (mask3)
[False  True False False  True False False  True]

df2 = df.ix[mask3, 'data1'].max()
#print (df2)

df2 = pd.Series(df2, index=df.index).mask(~mask3).fillna(1)
#print (df2)

print (df['data1'].div(df2.values,axis=0))
mun  loc  geo  block
0    0    0    0        55.000000
1    0    0    0         1.000000
     1    0    0        12.000000
          1    0        13.000000
2    0    0    0         0.857143
     1    1    1        12.000000
          2    1         6.000000
3    0    0    0         0.171429
dtype: float64

这篇关于带有逻辑 pandas 的多重索引和蒙版的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆