iloc和loc有何不同? [英] How are iloc and loc different?

查看:86
本文介绍了iloc和loc有何不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以解释这两种切片方法有何不同吗?

我见过文档
,我已经看到这些 ,但我仍然发现自己无法解释这三者之间的区别。在我看来,它们在很大程度上似乎是可互换的,因为它们处于切片的较低级别。

Can someone explain how these two methods of slicing are different?
I've seen the docs, and I've seen these answers, but I still find myself unable to explain how the three are different. To me, they seem interchangeable in large part, because they are at the lower levels of slicing.

例如,假设我们要获得 DataFrame 。这两个怎么工作?

For example, say we want to get the first five rows of a DataFrame. How is it that these two work?

df.loc[:5]
df.iloc[:5]

有人可以提出三种情况下用途区别更清楚的情况吗?

Can someone present three cases where the distinction in uses are clearer?

从前,我还想知道这两个函数与 df.ix [:5] ix 已从熊猫1.0中删除,所以我不在乎!

Once upon a time, I also wanted to know how these two functions differ from df.ix[:5] but ix has been removed from pandas 1.0, so I don't care anymore!

推荐答案

注意: pandas版本0.20.0及更高版本, ix 已弃用,并使用 loc iloc 。我留下了完整描述 ix 的部分内容,作为早期熊猫用户的参考。下面添加了示例,显示了 ix 的替代方案。

Note: in pandas version 0.20.0 and above, ix is deprecated and the use of loc and iloc is encouraged instead. I have left the parts of this answer that describe ix intact as a reference for users of earlier versions of pandas. Examples have been added below showing alternatives to ix.

首先,以下是这三种方法的回顾:

First, here's a recap of the three methods:


  • loc 获取索引中带有特定标签的行(或列)。

  • iloc 获取索引中特定位置处的行(或列)(因此它只需要整数) )。

  • ix 通常试图表现得像 loc ,但回落到如果标签不存在于索引中,则其行为类似于 iloc

  • loc gets rows (or columns) with particular labels from the index.
  • iloc gets rows (or columns) at particular positions in the index (so it only takes integers).
  • ix usually tries to behave like loc but falls back to behaving like iloc if a label is not present in the index.

重要的是要注意一些会使 ix 难以使用的微妙之处:

It's important to note some subtleties that can make ix slightly tricky to use:


  • 如果索引是整数类型,则 ix 将仅使用基于标签的索引,而不会使用基于位置的索引。如果标签不在索引中,则会引发错误。

  • if the index is of integer type, ix will only use label-based indexing and not fall back to position-based indexing. If the label is not in the index, an error is raised.

如果索引不包含仅 个整数,则给出整数 ix 将立即使用基于位置的索引,而不是基于标签的索引。但是,如果 ix 被赋予了另一种类型(例如字符串),则它可以使用基于标签的索引。

if the index does not contain only integers, then given an integer, ix will immediately use position-based indexing rather than label-based indexing. If however ix is given another type (e.g. a string), it can use label-based indexing.

为说明这三种方法之间的区别,请考虑以下系列:

To illustrate the differences between the three methods, consider the following Series:

>>> s = pd.Series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>> s
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN

我们来看用整数值切片 3

We'll look at slicing with the integer value 3.

在这种情况下, s.iloc [:3] 返回我们的前3行(因为它会将3视为头寸),而 s.loc [:3] 返回我们的前8行(因为它对待了3作为标签):

In this case, s.iloc[:3] returns us the first 3 rows (since it treats 3 as a position) and s.loc[:3] returns us the first 8 rows (since it treats 3 as a label):

>>> s.iloc[:3] # slice the first three rows
49   NaN
48   NaN
47   NaN

>>> s.loc[:3] # slice up to and including label 3
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

>>> s.ix[:3] # the integer is in the index so s.ix[:3] works like loc
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

注意 s.ix [:3] 返回与 s.loc [:3] 相同的系列首先查找标签而不是在位置上工作( s 的索引是整数类型。)

Notice s.ix[:3] returns the same Series as s.loc[:3] since it looks for the label first rather than working on the position (and the index for s is of integer type).

如果我们尝试使用不在索引中的整数标签(例如 6 )怎么办?

What if we try with an integer label that isn't in the index (say 6)?

这里 s.iloc [:6] 返回该系列的前6行。但是,由于 6 不在索引中,因此 s.loc [:6] 会引发KeyError。

Here s.iloc[:6] returns the first 6 rows of the Series as expected. However, s.loc[:6] raises a KeyError since 6 is not in the index.

>>> s.iloc[:6]
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN

>>> s.loc[:6]
KeyError: 6

>>> s.ix[:6]
KeyError: 6

根据上述注意事项, s.ix [:6] 现在会引发KeyError,因为它试图像 loc 一样工作,但是找不到索引中的 6 。因为我们的索引是整数类型 ix 不会像 iloc 那样表现。

As per the subtleties noted above, s.ix[:6] now raises a KeyError because it tries to work like loc but can't find a 6 in the index. Because our index is of integer type ix doesn't fall back to behaving like iloc.

但是,如果我们的索引为混合类型,则给定整数 ix 的行为类似于 iloc ,而不是引发KeyError:

If, however, our index was of mixed type, given an integer ix would behave like iloc immediately instead of raising a KeyError:

>>> s2 = pd.Series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5])
>>> s2.index.is_mixed() # index is mix of different types
True
>>> s2.ix[:6] # now behaves like iloc given integer
a   NaN
b   NaN
c   NaN
d   NaN
e   NaN
1   NaN

请记住, ix 仍然可以接受非整数并表现出例如 loc

Keep in mind that ix can still accept non-integers and behave like loc:

>>> s2.ix[:'c'] # behaves like loc given non-integer
a   NaN
b   NaN
c   NaN

作为一般建议,如果仅使用标签建立索引,或仅使用整数位置建立索引,请坚持使用 loc iloc 以避免意外的结果-尽量不要使用 ix

As general advice, if you're only indexing using labels, or only indexing using integer positions, stick with loc or iloc to avoid unexpected results - try not use ix.

有时给定一个DataFrame,您可能想混合使用标签和位置索引方法

Sometimes given a DataFrame, you will want to mix label and positional indexing methods for the rows and columns.

例如,考虑以下DataFrame。如何最好地将行划分为包括前三列的'c'

For example, consider the following DataFrame. How best to slice the rows up to and including 'c' and take the first four columns?

>>> df = pd.DataFrame(np.nan, 
                      index=list('abcde'),
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a NaN NaN NaN NaN NaN
b NaN NaN NaN NaN NaN
c NaN NaN NaN NaN NaN
d NaN NaN NaN NaN NaN
e NaN NaN NaN NaN NaN

在较早版本的熊猫(0.20.0之前)中 ix 使您可以整齐地执行此操作-我们可以按标签对行进行切片,按位置对列进行切片(请注意,对于列, ix 将默认为position的切片,因为 4 不是列名):

In earlier versions of pandas (before 0.20.0) ix lets you do this quite neatly - we can slice the rows by label and the columns by position (note that for the columns, ix will default to position-based slicing since 4 is not a column name):

>>> df.ix[:'c', :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

在更高版本的熊猫中,我们可以使用 iloc 和另一种方法:

In later versions of pandas, we can achieve this result using iloc and the help of another method:

>>> df.iloc[:df.index.get_loc('c') + 1, :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

get_loc() 是一种索引方法,意思是获取以下位置此索引中的标签。请注意,由于使用 iloc 进行切片不包含其端点,因此,如果我们还希望行'c',则必须在该值上加1。

get_loc() is an index method meaning "get the position of the label in this index". Note that since slicing with iloc is exclusive of its endpoint, we must add 1 to this value if we want row 'c' as well.

pandas文档中有其他示例此处

There are further examples in pandas' documentation here.

这篇关于iloc和loc有何不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆