为什么.loc具有片的包容性行为? [英] Why does .loc have inclusive behavior for slices?

查看:43
本文介绍了为什么.loc具有片的包容性行为?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于某些原因,以下两个对iloc/loc的调用会产生不同的行为:

For some reason, the following 2 calls to iloc / loc produce different behavior:

>>> import pandas as pd
>>> df = pd.DataFrame(dict(A=range(3), B=range(3)))
>>> df.iloc[:1]
   A  B
0  0  0
>>> df.loc[:1]
   A  B
0  0  0
1  1  1

我知道loc考虑行标签,而iloc考虑行的基于整数的索引.但是为什么loc调用的上限被认为是包含的,而iloc调用的上限却被认为是排他的呢?

I understand that loc considers the row labels, while iloc considers the integer-based indices of the rows. But why is the upper bound for the loc call considered inclusive, while the iloc bound is considered exclusive?

推荐答案

快速解答:

使用标签时,进行端到端切片通常更有意义,因为它需要有关DataFrame中其他行的知识较少.

It often makes more sense to do end-inclusive slicing when using labels, because it requires less knowledge about other rows in the DataFrame.

每当您关心标签而不是位置时,末端排他的标签切片都会以一种不方便的方式引入位置依赖性.

Whenever you care about labels instead of positions, end-exclusive label slicing introduces position-dependence in a way that can be inconvenient.

更长的答案:

任何函数的行为都是一个权衡:您偏爱某些用例而不是其他用例.最终,.iloc的操作是熊猫开发人员的主观设计决定(正如@ALlollz的评论所指出的,此行为

Any function's behavior is a trade-off: you favor some use cases over others. Ultimately the operation of .iloc is a subjective design decision by the Pandas developers (as the comment by @ALlollz indicates, this behavior is intentional). But to understand why they might have designed it that way, think about what makes label slicing different from positional slicing.

想象一下,我们有两个DataFrames df1df2:

Imagine we have two DataFrames df1 and df2:

df1 = pd.DataFrame(dict(X=range(4)), index=['a','b','c','d'])
df1 = pd.DataFrame(dict(X=range(4)), index=['b','c','z'])

df1包含:

   X
Y
a  0
b  1
c  2
d  3

df2包含:

   X
Y
b  0
c  1
z  2

假设我们要执行一个基于标签的任务:我们希望从df1df2两者中获取bc之间的行,并且我们希望对两者使用相同的代码数据框.因为bc在两个DataFrame中都没有相同的位置,所以简单的位置切片无法解决问题.因此,我们转向基于标签的切片.

Let's say we have a label-based task to perform: we want to get rows between b and c from both df1 and df2, and we want to do it using the same code for both DataFrames. Because b and c don't have the same positions in both DataFrames, simple positional slicing won't do the trick. So we turn to label-based slicing.

如果.loc是结尾专有的,要获得bc之间的行,我们不仅需要知道所需结尾行的标签,还需要知道之后的下一行标签.按照构造,此下一个标签在每个DataFrame中将有所不同.

If .loc were end-exclusive, to get rows between b and c we would need to know not only the label of our desired end row, but also the label of the next row after that. As constructed, this next label would be different in each DataFrame.

在这种情况下,我们有两个选择:

In this case, we would have two options:

  • 为每个DataFrame使用单独的代码:df1.loc['b':'d']df2.loc['b':'z'].这很不方便,因为这意味着我们除了需要的行之外还需要了解其他信息.
  • 首先获取位置索引,加1,然后使用位置切片:df.loc[df.index.get_loc('b'):df.index.get_loc('c')+1].这只是罗word.
  • Use separate code for each DataFrame: df1.loc['b':'d'] and df2.loc['b':'z']. This is inconvenient because it means we need to know extra information beyond just the rows that we want.
  • Get the positional index first, add 1, and then use positional slicing: df.loc[df.index.get_loc('b'):df.index.get_loc('c')+1]. This is just wordy.

但是由于.loc是包含结尾的,因此我们只能说.loc['b':'c'].简单得多!

But since .loc is end-inclusive, we can just say .loc['b':'c']. Much simpler!

每当您关心标签而不是位置时,并尝试编写与位置无关的代码时,包含结尾的标签切片会以一种不方便的方式重新引入位置相关性.

Whenever you care about labels instead of positions, and you're trying to write position-independent code, end-inclusive label slicing re-introduces position-dependence in a way that can be inconvenient.

也就是说,也许在某些用例中,您确实需要基于标签的基于排他的切片.如果是这样,您可以在此问题中使用 @Willz的答案:

That said, maybe there are use cases where you really do want end-exclusive label-based slicing. If so, you can use @Willz's answer in this question:

df.loc[start:end].iloc[:-1]

这篇关于为什么.loc具有片的包容性行为?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆