iloc,ix和loc有何不同? [英] How are iloc, ix and loc different?

查看:74
本文介绍了iloc,ix和loc有何不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以解释这三种切片方法有何不同吗?
我看过文档, 并且我已经看到这些

Can someone explain how these three methods of slicing are different?
I've seen the docs, and I've seen these answers, but I still find myself unable to explain how the three are different. To me, they seem interchangeable in large part, because they are at the lower levels of slicing.

例如,假设我们要获取DataFrame的前五行.这三者如何运作?

For example, say we want to get the first five rows of a DataFrame. How is it that all three of these work?

df.loc[:5]
df.ix[:5]
df.iloc[:5]

有人可以提出三种情况下使用上的区别更清楚吗?

Can someone present three cases where the distinction in uses are clearer?

推荐答案

注意:在熊猫版本0.20.0及更高版本中,ix

Note: in pandas version 0.20.0 and above, ix is deprecated and the use of loc and iloc is encouraged instead. I have left the parts of this answer that describe ix intact as a reference for users of earlier versions of pandas. Examples have been added below showing alternatives to ix.

首先,以下是这三种方法的回顾:

First, here's a recap of the three methods:

  • loc从索引中获取带有特定标签的行(或列).
  • iloc获取索引中特定位置处的行(或列)(因此它仅采用整数).
  • ix通常试图表现得像loc一样,但是如果索引中不存在标签,则会回落到iloc那样的行为.
  • loc gets rows (or columns) with particular labels from the index.
  • iloc gets rows (or columns) at particular positions in the index (so it only takes integers).
  • ix usually tries to behave like loc but falls back to behaving like iloc if a label is not present in the index.

重要的是要注意一些细微之处,这些细微之处可能会使ix使用起来有些棘手:

It's important to note some subtleties that can make ix slightly tricky to use:

  • 如果索引是整数类型,则ix将仅使用基于标签的索引,而不会使用基于位置的索引.如果标签不在索引中,则会引发错误.

  • if the index is of integer type, ix will only use label-based indexing and not fall back to position-based indexing. If the label is not in the index, an error is raised.

如果索引不包含仅 个整数,则给定一个整数,ix将立即使用基于位置的索引而不是基于标签的索引.但是,如果ix被赋予了另一种类型(例如字符串),则它可以使用基于标签的索引.

if the index does not contain only integers, then given an integer, ix will immediately use position-based indexing rather than label-based indexing. If however ix is given another type (e.g. a string), it can use label-based indexing.

要说明这三种方法之间的区别,请考虑以下系列:

To illustrate the differences between the three methods, consider the following Series:

>>> s = pd.Series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>> s
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN

我们将使用整数值3进行切片.

We'll look at slicing with the integer value 3.

在这种情况下,s.iloc[:3]返回我们的前3行(因为它将3视为位置),而s.loc[:3]返回我们的前8行(因为将3视为标签):

In this case, s.iloc[:3] returns us the first 3 rows (since it treats 3 as a position) and s.loc[:3] returns us the first 8 rows (since it treats 3 as a label):

>>> s.iloc[:3] # slice the first three rows
49   NaN
48   NaN
47   NaN

>>> s.loc[:3] # slice up to and including label 3
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

>>> s.ix[:3] # the integer is in the index so s.ix[:3] works like loc
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

通知s.ix[:3]返回与s.loc[:3]相同的系列,因为它首先查找标签而不是在位置上工作(s的索引是整数类型).

Notice s.ix[:3] returns the same Series as s.loc[:3] since it looks for the label first rather than working on the position (and the index for s is of integer type).

如果我们尝试使用不在索引中的整数标签(例如6)怎么办?

What if we try with an integer label that isn't in the index (say 6)?

此处s.iloc[:6]返回预期的Series的前6行.但是,由于6不在索引中,因此s.loc[:6]引发KeyError.

Here s.iloc[:6] returns the first 6 rows of the Series as expected. However, s.loc[:6] raises a KeyError since 6 is not in the index.

>>> s.iloc[:6]
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN

>>> s.loc[:6]
KeyError: 6

>>> s.ix[:6]
KeyError: 6

按照上面提到的技巧,s.ix[:6]现在会引发KeyError,因为它试图像loc一样工作,但是在索引中找不到6.因为我们的索引是整数类型,所以ix不会像iloc那样表现.

As per the subtleties noted above, s.ix[:6] now raises a KeyError because it tries to work like loc but can't find a 6 in the index. Because our index is of integer type ix doesn't fall back to behaving like iloc.

但是,如果我们的索引为混合类型,则给定整数ix会立即表现为iloc,而不会引发KeyError:

If, however, our index was of mixed type, given an integer ix would behave like iloc immediately instead of raising a KeyError:

>>> s2 = pd.Series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5])
>>> s2.index.is_mixed() # index is mix of different types
True
>>> s2.ix[:6] # now behaves like iloc given integer
a   NaN
b   NaN
c   NaN
d   NaN
e   NaN
1   NaN

请记住,ix仍然可以接受非整数并且行为类似于loc:

Keep in mind that ix can still accept non-integers and behave like loc:

>>> s2.ix[:'c'] # behaves like loc given non-integer
a   NaN
b   NaN
c   NaN

作为一般建议,如果仅使用标签建立索引,或者仅使用整数位置建立索引,请坚持使用lociloc以避免意外的结果-请勿使用ix.

As general advice, if you're only indexing using labels, or only indexing using integer positions, stick with loc or iloc to avoid unexpected results - try not use ix.

有时给定一个DataFrame,您将需要为行和列混合使用标签和位置索引方法.

Sometimes given a DataFrame, you will want to mix label and positional indexing methods for the rows and columns.

例如,考虑以下DataFrame.如何最好地将行切成包含前四个列的'c'并包括在内?

For example, consider the following DataFrame. How best to slice the rows up to and including 'c' and take the first four columns?

>>> df = pd.DataFrame(np.nan, 
                      index=list('abcde'),
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a NaN NaN NaN NaN NaN
b NaN NaN NaN NaN NaN
c NaN NaN NaN NaN NaN
d NaN NaN NaN NaN NaN
e NaN NaN NaN NaN NaN

在早期版本的pandas(0.20.0之前)中,ix使您可以整齐地进行此操作-我们可以按标签对行进行切片,按位置对列进行切片(请注意,对于列,ix将默认设置为position的切片,因为4不是列名):

In earlier versions of pandas (before 0.20.0) ix lets you do this quite neatly - we can slice the rows by label and the columns by position (note that for the columns, ix will default to position-based slicing since 4 is not a column name):

>>> df.ix[:'c', :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

在更高版本的熊猫中,我们可以使用iloc和另一种方法的帮助来实现此结果:

In later versions of pandas, we can achieve this result using iloc and the help of another method:

>>> df.iloc[:df.index.get_loc('c') + 1, :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

get_loc() 是一种索引方法,意思是获取标签在此索引中的位置".请注意,由于用iloc进行切片不包含其端点,因此如果还要行'c',则必须在该值上加1.

get_loc() is an index method meaning "get the position of the label in this index". Note that since slicing with iloc is exclusive of its endpoint, we must add 1 to this value if we want row 'c' as well.

pandas文档中还有其他示例

There are further examples in pandas' documentation here.

这篇关于iloc,ix和loc有何不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆