pandas :根据条件为多索引数据框的子集设置值的正确方法 [英] Pandas : Proper way to set values based on condition for subset of multiindex dataframe

查看:49
本文介绍了 pandas :根据条件为多索引数据框的子集设置值的正确方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不确定在没有链式分配的情况下如何执行此操作(由于我要设置副本,因此反正可能无法正常工作).

I'm not sure of how to do this without chained assignments (which probably wouldn't work anyways because I'd be setting a copy).

我不想获取多索引熊猫数据框的子集,测试小于零的值并将其设置为零.

I wan't to take a subset of a multiindex pandas dataframe, test for values less than zero and set them to zero.

例如:

df = pd.DataFrame({('A','a'): [-1,-1,0,10,12],
                   ('A','b'): [0,1,2,3,-1],
                   ('B','a'): [-20,-10,0,10,20],
                   ('B','b'): [-200,-100,0,100,200]})

df[df['A']<0] = 0.0

给予

In [37]:

df

Out[37]:
    A   B
    a   b   a   b
0   -1  0   -20 -200
1   -1  1   -10 -100
2   0   2   0   0
3   10  3   10  100
4   12  -1  20  200

这表明它无法根据条件进行设置.或者,如果我进行了链接分配:

Which shows that it was not able to set based on the condition. Alternatively if I did a chained assignment:

df.loc[:,'A'][df['A']<0] = 0.0

这给出了相同的结果(并带有复制警告设置)

This gives the same result (and setting with copy warning)

我可以根据第一级是我想要的条件来遍历每一列:

I could loop through each column based on the condition that the first level is the one that I want:

for one,two in df.columns.values:
    if one == 'A':
        df.loc[df[(one,two)]<0, (one,two)] = 0.0

给出所需的结果:

In [64]:

df

Out[64]:
    A   B
    a   b   a   b
0   0   0   -20 -200
1   0   1   -10 -100
2   0   2   0   0
3   10  3   10  100
4   12  0   20  200

但是某种程度上,我觉得有比遍历各列更好的方法.在大熊猫中做到这一点的最佳方法是什么?

But somehow I feel there is a better way to do this than looping through the columns. What is the best way to do this in pandas?

推荐答案

这是的应用程序(也是使用MultiIndex slicer的主要动机之一),请参阅docs

This is an application of (and one of the main motivations for using MultiIndex slicers), see docs here

In [20]: df = pd.DataFrame({('A','a'): [-1,-1,0,10,12],
                   ('A','b'): [0,1,2,3,-1],
                   ('B','a'): [-20,-10,0,10,20],
                   ('B','b'): [-200,-100,0,100,200]})

In [21]: df
Out[21]: 
    A      B     
    a  b   a    b
0  -1  0 -20 -200
1  -1  1 -10 -100
2   0  2   0    0
3  10  3  10  100
4  12 -1  20  200

In [22]: idx = pd.IndexSlice

In [23]: mask = df.loc[:,idx['A',:]]<0

In [24]: mask
Out[24]: 
       A       
       a      b
0   True  False
1   True  False
2  False  False
3  False  False
4  False   True

In [25]: df[mask] = 0

In [26]: df
Out[26]: 
    A      B     
    a  b   a    b
0   0  0 -20 -200
1   0  1 -10 -100
2   0  2   0    0
3  10  3  10  100
4  12  0  20  200

由于您正在使用列索引的第一级,因此以下内容也将起作用.上面的示例更为笼统,说您想对"a"执行此操作.

Since you are working with the 1st level of the columns index, the following will work as well. The above example is more general, say you wanted to do this for 'a'.

In [30]: df[df[['A']]<0] = 0

In [31]: df
Out[31]: 
    A      B     
    a  b   a    b
0   0  0 -20 -200
1   0  1 -10 -100
2   0  2   0    0
3  10  3  10  100
4  12  0  20  200

这篇关于 pandas :根据条件为多索引数据框的子集设置值的正确方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆