布尔索引的Pandas,loc和non loc [英] Pandas, loc vs non loc for boolean indexing
问题描述
我所做的所有研究都指向使用loc
作为通过col(s)值过滤数据框的方法,今天我正在阅读
All the research I do point to using loc
as the way to filter a dataframe by a col(s) value(s), today I was reading this and I discovered by the examples I tested, that loc
isn't isn't really needed when filtering cols by it's values:
EX:
df = pd.DataFrame(np.arange(0, 20, 0.5).reshape(8, 5), columns=['a', 'b', 'c', 'd', 'e'])
df.loc[df['a'] >= 15]
a b c d e
6 15.0 15.5 16.0 16.5 17.0
7 17.5 18.0 18.5 19.0 19.5
df[df['a'] >= 15]
a b c d e
6 15.0 15.5 16.0 16.5 17.0
7 17.5 18.0 18.5 19.0 19.5
注意:我确实知道执行loc
或iloc
会通过索引和位置返回行.我不是基于此功能进行比较.
Note: I do know that doing loc
or iloc
return the rows by it's index and and the position. I'm not comparing based on this functionality.
但是在过滤时,执行"where
"子句与不使用loc
有什么区别?如果有的话.为什么我遇到的有关该主题的所有示例都使用loc
?
But when filtering, doing "where
" clauses what's the difference between using or not using loc
? If any. And why do all the examples I come across regarding this subject use loc
?
推荐答案
As per the docs, loc
accepts a boolean array for selecting rows, and in your case
>>> df['a'] >= 15
>>>
0 False
1 False
2 False
3 False
4 False
5 False
6 True
7 True
Name: a, dtype: bool
被视为布尔数组.
根据pandas
的作者韦斯·麦金尼(Wes McKinney)的说法,您可以在此处省略loc
并发出df[df['a'] >= 15]
的事实是一种特殊情况的方便.
The fact that you can omit loc
here and issue df[df['a'] >= 15]
is a special case convenience according to Wes McKinney, the author of pandas
.
直接从他的书中引用用于数据分析的Python ,第144,df[val]
用于...
Quoting directly from his book, Python for Data Analysis, p. 144, df[val]
is used to...
从DataFrame中选择单列或列序列; 特殊情况 便利:布尔数组(过滤器行),切片(切片行)或布尔DataFrame (根据某些条件设置值)
Select single column or sequence of columns from the DataFrame; special case conveniences: boolean array (filter rows), slice (slice rows), or boolean DataFrame (set values based on some criterion)
这篇关于布尔索引的Pandas,loc和non loc的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!