pandas iloc vs ix vs loc解释? [英] pandas iloc vs ix vs loc explanation?

查看:174
本文介绍了 pandas iloc vs ix vs loc解释?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以解释这三种切片方法的不同吗?

我看过文档
和我已经看到这些 ,但我仍然发现自己无法解释这三个是不同的。对我来说,它们在很大程度上似乎是可互换的,因为它们处于较低的切片水平。



例如,假设我们要获取前五行的 DataFrame 。这些都是如何工作的?

  df.loc [:5] 
df.ix [ 5]
df.iloc [:5]

有人可以提出三种情况,区分使用更清晰?

解决方案

首先,总结:




  • loc 在索引中的标签上工作。

  • iloc 在索引中的位置上工作(因此只能使用整数)。 >
  • ix 通常会尝试像 loc 的行为,但回到行为像$ $ c> iloc 如果标签不在索引中。



请注意一些细节可以使 ix 稍微难以使用:




  • 如果索引为的整数类型, ix 只会使用基于标签的索引,而不会退回到基于位置的索引。如果标签不在索引中,则会出现错误。


  • 如果索引不包含只有整数,则给定一个整数, ix 将立即使用基于位置的索引,而不是基于标签的索引。然而,如果 ix 被赋予另一种类型(例如字符串),则可以使用基于标签的索引。







为了说明这三种方法之间的差异,系列:

 >>> s = pd.Series(np.nan,index = [49,48,47,46,45,1,2,3,4,5])
>>> s
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN

然后 s.iloc [:3] 返回前3行(因为它看起来位置)和 s.loc [:3] 返回前8行(因为它看着标签):

 >>> s.iloc [:3] 
49 NaN
48 NaN
47 NaN

>>> s.loc [:3]
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
2 NaN
3 NaN

>>> s.ix [:3]#整数在索引中,所以s.ix [:3]的工作原理像loc
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
2 NaN
3 NaN

注意事项 s.ix [:3] 返回与 s.loc [:3] 相同的系列,因为它寻找标签,而不是按位置(并且索引是整数类型)。



如果我们尝试使用不在索引中的整数标签怎么办(说 6 )?



这里 s.iloc [:6] 按预期的方式返回系列的前6行。但是, s.loc [:6] 引发了一个KeyError,因为 6 不在索引中。

 >>> s.iloc [:6] 
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN

>>> s.loc [:6]
KeyError:6

>>> s.ix [:6]
KeyError:6

根据上述细微之处, s.ix [:6] 现在引发了一个KeyError,因为它尝试像 loc 一样工作,但找不到索引中的 6 因为我们的索引是整数类型,它不会像 iloc 这样的行为。



但是,我们的索引是混合类型,给定一个整数 ix 将立即表现为 iloc ,而不是提高KeyError: p>

 >>> s2 = pd.Series(np.nan,index = ['a','b','c','d','e',1,2,3,4,5])
> >> s2.index.is_mixed()#index是混合类型
True
>>> s2.ix [:6]#表现像iloc给定整数
a NaN
b NaN
c NaN
d NaN
e NaN
1 NaN

请记住, ix 仍然可以接受非整数并表现得像 loc

 >>> s2.ix [:'c']#表现像loc给定非整数
a NaN
b NaN
c NaN






一般建议:如果您只使用标签进行索引,或者仅使用整数位置进行索引,请坚持 loc iloc 以避免意外的结果。



如果然而,您有一个DataFrame,您想要混合标签和位置索引类型, ix 可以让您这样做:

 >>> df = pd.DataFrame(np.arange(25).reshape(5,5),
index = list('abcde'),
columns = ['x','y','z ',8,9])
>>> df
xyz 8 9
a 0 1 2 3 4
b 5 6 7 8 9
c 10 11 12 13 14
d 1​​5 16 17 18 19
e 20 21 22 23 24

使用 ix ,我们可以切片行标签和列按位置(请注意,对于列, ix 默认为基于位置的切片,因为标签 4 不是列名):

 >>> df.ix [:'c',:4] 
xyz 8
a 0 1 2 3
b 5 6 7 8
c 10 11 12 13


Can someone explain how these three methods of slicing are different?
I've seen
the docs, and I've seen these answers, but I still find myself unable to explain how the three are different. To me, they seem interchangeable in large part, because they are at the lower levels of slicing.

For example, say we want to get the first five rows of a DataFrame. How is it that all three of these work?

df.loc[:5]
df.ix[:5]
df.iloc[:5]

Can someone present three cases where the distinction in uses are clearer?

解决方案

First, a recap:

  • loc works on labels in the index.
  • iloc works on the positions in the index (so it only takes integers).
  • ix usually tries to behave like loc but falls back to behaving like iloc if the label is not in the index.

It's important to note some subtleties that can make ix slightly tricky to use:

  • if the index is of integer type, ix will only use label-based indexing and not fall back to position-based indexing. If the label is not in the index, an error is raised.

  • if the index does not contain only integers, then given an integer, ix will immediately use position-based indexing rather than label-based indexing. If however ix is given another type (e.g. a string), it can use label-based indexing.


To illustrate the differences between the three methods, consider the following Series:

>>> s = pd.Series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>> s
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN

Then s.iloc[:3] returns the first 3 rows (since it looks at the position) and s.loc[:3] returns the first 8 rows (since it looks at the labels):

>>> s.iloc[:3]
49   NaN
48   NaN
47   NaN

>>> s.loc[:3]
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

>>> s.ix[:3] # the integer is in the index so s.ix[:3] works like loc
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

Notice s.ix[:3] returns the same Series as s.loc[:3] since it looks for the label first rather than going by position (and the index is of integer type).

What if we try with an integer label that isn't in the index (say 6)?

Here s.iloc[:6] returns the first 6 rows of the Series as expected. However, s.loc[:6] raises a KeyError since 6 is not in the index.

>>> s.iloc[:6]
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN

>>> s.loc[:6]
KeyError: 6

>>> s.ix[:6]
KeyError: 6

As per the subtleties noted above, s.ix[:6] now raises a KeyError because it tries to work like loc but can't find a 6 in the index. Because our index is of integer type it doesn't fall back to behaving like iloc.

If, however, our index was of mixed type, given an integer ix would behave like iloc immediately instead of raising a KeyError:

>>> s2 = pd.Series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5])
>>> s2.index.is_mixed() # index is mix of types
True
>>> s2.ix[:6] # behaves like iloc given integer
a   NaN
b   NaN
c   NaN
d   NaN
e   NaN
1   NaN

Keep in mind that ix can still accept non-integers and behave like loc:

>>> s2.ix[:'c'] # behaves like loc given non-integer
a   NaN
b   NaN
c   NaN


General advice: if you're only indexing using labels, or only indexing using integer positions, stick with loc or iloc to avoid unexpected results.

If however you have a DataFrame and you want to mix label and positional index types, ix lets you do this:

>>> df = pd.DataFrame(np.arange(25).reshape(5,5), 
                      index=list('abcde'),
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a   0   1   2   3   4
b   5   6   7   8   9
c  10  11  12  13  14
d  15  16  17  18  19
e  20  21  22  23  24

Using ix, we can slice the rows by label and the columns by position (note that for the columns, ix default to position-based slicing since the label 4 is not a column name):

>>> df.ix[:'c', :4]
    x   y   z   8
a   0   1   2   3
b   5   6   7   8
c  10  11  12  13

这篇关于 pandas iloc vs ix vs loc解释?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆