在 pandas 中用布尔数组索引对象的最惯用的方法是什么? [英] What is the most idiomatic way to index an object with a boolean array in pandas?
问题描述
我正在特别谈论Pandas版本0.11,因为我正忙于用.loc或.iloc替换.ix的用法.我喜欢这样的事实,区分.loc和.iloc可以传达我是否打算按标签或整数位置进行索引.我看到任何一个都可以接受布尔数组,但是我想保持纯洁的用法以清楚地表达我的意图.
I am particularly talking about Pandas version 0.11 as I am busy replacing my uses of .ix with either .loc or .iloc. I like the fact that differentiating between .loc and .iloc communicates whether I am intending to index by label or integer position. I see that either one will accept a boolean array as well but I would like to keep their usage pure to clearly communicate my intent.
推荐答案
In 11.0 all three methods work, the way suggested in the docs is simply to use df[mask]
. However, this is not done on position, but purely using labels, so in my opinion loc
best describes what's actually going on.
更新:我在 github 上问过,结论是df.iloc[msk]
将在熊猫11.1
中给出NotImplementedError
(如果是整数索引掩码)或ValueError
(如果是非整数索引).
Update: I asked on github about this, the conclusion being that df.iloc[msk]
will give a NotImplementedError
(if integer indexed mask) or ValueError
(if non-integer indexed) in pandas 11.1
.
In [1]: df = pd.DataFrame(range(5), list('ABCDE'), columns=['a'])
In [2]: mask = (df.a%2 == 0)
In [3]: mask
Out[3]:
A True
B False
C True
D False
E True
Name: a, dtype: bool
In [4]: df[mask]
Out[4]:
a
A 0
C 2
E 4
In [5]: df.loc[mask]
Out[5]:
a
A 0
C 2
E 4
In [6]: df.iloc[mask] # Due to this question, this will give a ValueError (in 11.1)
Out[6]:
a
A 0
C 2
E 4
也许值得一提的是,如果给掩码整数索引,它将引发错误:
Perhaps worth noting that if you gave mask integer index it would throw an error:
mask.index = range(5)
df.iloc[mask] # or any of the others
IndexingError: Unalignable boolean Series key provided
这表明iloc实际上并未实现,它使用标签,因此为什么当我们尝试使用11.1时会抛出NotImplementedError
.
这篇关于在 pandas 中用布尔数组索引对象的最惯用的方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!