为什么大 pandas 中的.loc切片包含stop,这与典型的python切片相反? [英] Why is .loc slicing in pandas inclusive of stop, contrary to typical python slicing?
问题描述
我正在切片pandas数据框,并且与numpy和普通python切片相比,我似乎正在使用.loc
获得意外切片.请参见下面的示例.
I am slicing a pandas dataframe and I seem to be getting unexpected slices using .loc
, at least as compared to numpy and ordinary python slicing. See the example below.
>>> import pandas as pd
>>> a = pd.DataFrame([[0,1,2],[3,4,5],[4,5,6],[9,10,11],[34,2,1]])
>>> a
0 1 2
0 0 1 2
1 3 4 5
2 4 5 6
3 9 10 11
4 34 2 1
>>> a.loc[1:3, :]
0 1 2
1 3 4 5
2 4 5 6
3 9 10 11
>>> a.values[1:3, :]
array([[3, 4, 5],
[4, 5, 6]])
有趣的是,这仅在.loc
而不是.iloc
上发生.
Interestingly, this only happens with .loc
, not .iloc
.
>>> a.iloc[1:3, :]
0 1 2
1 3 4 5
2 4 5 6
因此,.loc
似乎包含终止索引,而numpy和.iloc
则不包含.
Thus, .loc
appears to be inclusive of the terminating index, but numpy and .iloc
are not.
通过评论,看来这不是一个错误,我们得到了很好的警告.但是为什么会这样呢?
By the comments, it seems this is not a bug and we are well warned. But why is it the case?
推荐答案
记住.loc
是基于主要标签的索引.使用non-RangeIndex时,包含stop端点的决定变得更加明显:
Remember .loc
is primarily label based indexing. The decision to include the stop endpoint becomes far more obvious when working with a non-RangeIndex:
df = pd.DataFrame([1,2,3,4], index=list('achz'))
# 0
#a 1
#c 2
#h 3
#z 4
如果我想选择'a'
和'h'
(包括)之间的所有行,我只会了解'a'
和'h'
.为了与其他python切片保持一致,您还需要知道'h'
之后是哪个索引,在本例中为'z'
,但可能是任何索引.
If I want to select all rows between 'a'
and 'h'
(inclusive) I only know about 'a'
and 'h'
. In order to be consistent with other python slicing, you'd need to also know what index follows 'h'
, which in this case is 'z'
but could have been anything.
文档的隐藏部分也解释了这种设计选择端点是包容的
There's also a section of the documents hidden away that explains this design choice Endpoints are Inclusive
这篇关于为什么大 pandas 中的.loc切片包含stop,这与典型的python切片相反?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!