pandas :根据条件为多索引数据框的子集设置值的正确方法 [英] Pandas : Proper way to set values based on condition for subset of multiindex dataframe
问题描述
我不确定在没有链式分配的情况下如何执行此操作(由于我要设置副本,因此反正可能无法正常工作).
I'm not sure of how to do this without chained assignments (which probably wouldn't work anyways because I'd be setting a copy).
我不想获取多索引熊猫数据框的子集,测试小于零的值并将其设置为零.
I wan't to take a subset of a multiindex pandas dataframe, test for values less than zero and set them to zero.
例如:
df = pd.DataFrame({('A','a'): [-1,-1,0,10,12],
('A','b'): [0,1,2,3,-1],
('B','a'): [-20,-10,0,10,20],
('B','b'): [-200,-100,0,100,200]})
df[df['A']<0] = 0.0
给予
In [37]:
df
Out[37]:
A B
a b a b
0 -1 0 -20 -200
1 -1 1 -10 -100
2 0 2 0 0
3 10 3 10 100
4 12 -1 20 200
这表明它无法根据条件进行设置.或者,如果我进行了链接分配:
Which shows that it was not able to set based on the condition. Alternatively if I did a chained assignment:
df.loc[:,'A'][df['A']<0] = 0.0
这给出了相同的结果(并带有复制警告设置)
This gives the same result (and setting with copy warning)
我可以根据第一级是我想要的条件来遍历每一列:
I could loop through each column based on the condition that the first level is the one that I want:
for one,two in df.columns.values:
if one == 'A':
df.loc[df[(one,two)]<0, (one,two)] = 0.0
给出所需的结果:
In [64]:
df
Out[64]:
A B
a b a b
0 0 0 -20 -200
1 0 1 -10 -100
2 0 2 0 0
3 10 3 10 100
4 12 0 20 200
但是某种程度上,我觉得有比遍历各列更好的方法.在大熊猫中做到这一点的最佳方法是什么?
But somehow I feel there is a better way to do this than looping through the columns. What is the best way to do this in pandas?
推荐答案
这是的应用程序(也是使用MultiIndex slicer的主要动机之一),请参阅docs
This is an application of (and one of the main motivations for using MultiIndex slicers), see docs here
In [20]: df = pd.DataFrame({('A','a'): [-1,-1,0,10,12],
('A','b'): [0,1,2,3,-1],
('B','a'): [-20,-10,0,10,20],
('B','b'): [-200,-100,0,100,200]})
In [21]: df
Out[21]:
A B
a b a b
0 -1 0 -20 -200
1 -1 1 -10 -100
2 0 2 0 0
3 10 3 10 100
4 12 -1 20 200
In [22]: idx = pd.IndexSlice
In [23]: mask = df.loc[:,idx['A',:]]<0
In [24]: mask
Out[24]:
A
a b
0 True False
1 True False
2 False False
3 False False
4 False True
In [25]: df[mask] = 0
In [26]: df
Out[26]:
A B
a b a b
0 0 0 -20 -200
1 0 1 -10 -100
2 0 2 0 0
3 10 3 10 100
4 12 0 20 200
由于您正在使用列索引的第一级,因此以下内容也将起作用.上面的示例更为笼统,说您想对"a"执行此操作.
Since you are working with the 1st level of the columns index, the following will work as well. The above example is more general, say you wanted to do this for 'a'.
In [30]: df[df[['A']]<0] = 0
In [31]: df
Out[31]:
A B
a b a b
0 0 0 -20 -200
1 0 1 -10 -100
2 0 2 0 0
3 10 3 10 100
4 12 0 20 200
这篇关于 pandas :根据条件为多索引数据框的子集设置值的正确方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!