iloc 和 loc 有何不同? [英] How are iloc and loc different?

查看:31
本文介绍了iloc 和 loc 有何不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

谁能解释一下这两种切片方法有什么不同?
我看过文档,我看过这些 答案,但我仍然发现自己无法理解这三者有何不同.对我来说,它们在很大程度上似乎可以互换,因为它们处于较低的切片级别.

例如,假设我们想要获取 DataFrame 的前五行.这两个是如何工作的?

df.loc[:5]df.iloc[:5]

谁能举出三种用法区别比较明显的案例?


曾几何时,我也想知道这两个函数与 df.ix[:5] 有何不同,但是 ix 已从 pandas 1.0 中删除,所以我不在乎了.

解决方案

标签 vs. 位置

这两种方法的主要区别是:

  • loc 获取带有特定标签的行(和/或列).

  • iloc 在整数位置处获取行(和/或列).

为了演示,考虑一系列具有非单调整数索引的s字符:

<预><代码>>>>s = pd.Series(list(abcdef"), index=[49, 48, 47, 0, 1, 2])49个48羽47c0天1 e2 英尺>>>s.loc[0] # 索引标签 0 处的值'd'>>>s.iloc[0] # 索引位置 0 处的值'一种'>>>s.loc[0:1] # 索引标签在 0 到 1(含)之间的行0天1 e>>>s.iloc[0:1] # 索引位置在 0 和 1 之间的行(不包括)49个

以下是在传递各种对象时 s.locs.iloc 之间的一些差异/相似之处:

<头>中的最后一个值
<对象>描述s.loc[]s.iloc[]
0单项索引处的值label 0(字符串'd')索引 location 0 处的值(字符串 'a')
0:1切片两行(标签01)一行(位置 0 的第一行)
1:47越界结束的切片行(空系列)五行(位置 1 以上)
1:47:-1负步切片行(标签1回到47)行(空系列)
[2, 0]整数列表两行给定标签两行给定位置
s >'e'布尔系列(指示哪些值具有属性)一行(包含'f')NotImplementedError
(s>'e').values布尔数组一行(包含'f')loc
999int 对象不在索引中KeyErrorIndexError(越界)
-1int 对象不在索引中KeyError返回s
lambda x: x.index[3]可调用应用于系列(此处返回索引中的第 3 个rd 项)s.loc[s.index[3]]s.iloc[s.index[3]]

loc 的标签查询功能远远超出了整数索引,值得强调几个额外的例子.

这是一个索引包含字符串对象的系列:

<预><代码>>>>s2 = pd.Series(s.index, index=s.values)>>>s2一个 494847012

由于 loc 是基于标签的,它可以使用 s2.loc['a'] 获取系列中的第一个值.它还可以使用非整数对象切片:

<预><代码>>>>s2.loc['c':'e'] # 'c' 和 'e' 之间的所有行(包括)4701

对于 DateTime 索引,我们不需要传递确切的日期/时间来按标签获取.例如:

<预><代码>>>>s3 = pd.Series(list('abcde'), pd.date_range('now', period=5, freq='M'))>>>s32021-01-31 16:41:31.8797682021-02-28 16:41:31.879768 b2021-03-31 16:41:31.879768 c2021-04-30 16:41:31.879768 d2021-05-31 16:41:31.879768 e

然后要获取 2021 年 3 月/4 月的行,我们只需要:

<预><代码>>>>s3.loc['2021-03':'2021-04']2021-03-31 17:04:30.742316 c2021-04-30 17:04:30.742316 d

行和列

lociloc 处理 DataFrame 的方式与处理系列的方式相同.需要注意的是,这两种方法都可以同时处理列和行.

当给定一个元组时,第一个元素用于索引行,如果存在,第二个元素用于索引列.

考虑下面定义的 DataFrame:

<预><代码>>>>将 numpy 导入为 np>>>df = pd.DataFrame(np.arange(25).reshape(5, 5),索引=列表('abcde'),列=['x','y','z', 8, 9])>>>dfxy z 8 90 1 2 3 45 6 7 8 910 11 12 13 14d 15 16 17 18 1920 21 22 23 24

然后例如:

<预><代码>>>>df.loc['c': , :'z'] # 行 'c' 及以后的 AND 列直到 'z'xyz10 11 1215 16 1720 21 22>>>df.iloc[:, 3] # 所有行,但只有索引位置 3 处的列一个 3813d 1823

有时我们想为行和列混合标签和位置索引方法,以某种方式结合 lociloc 的功能.

例如,考虑以下 DataFrame.如何最好地将行切片到并包括c"取前四列?

<预><代码>>>>将 numpy 导入为 np>>>df = pd.DataFrame(np.arange(25).reshape(5, 5),索引=列表('abcde'),列=['x','y','z', 8, 9])>>>dfxy z 8 90 1 2 3 45 6 7 8 910 11 12 13 14d 15 16 17 18 1920 21 22 23 24

我们可以使用 iloc 和另一种方法的帮助来实现这个结果:

<预><代码>>>>df.iloc[:df.index.get_loc('c') + 1, :4]xy z 80 1 2 35 6 7 810 11 12 13

get_loc() 是一种索引方法,意思是获取标签在该索引中的位置".请注意,由于使用 iloc 进行切片不包括其端点,如果我们还想要行 'c',我们必须向该值加 1.

Can someone explain how these two methods of slicing are different?
I've seen the docs, and I've seen these answers, but I still find myself unable to understand how the three are different. To me, they seem interchangeable in large part, because they are at the lower levels of slicing.

For example, say we want to get the first five rows of a DataFrame. How is it that these two work?

df.loc[:5]
df.iloc[:5]

Can someone present three cases where the distinction in uses are clearer?


Once upon a time, I also wanted to know how these two functions differ from df.ix[:5] but ix has been removed from pandas 1.0, so I don't care anymore.

解决方案

Label vs. Location

The main distinction between the two methods is:

  • loc gets rows (and/or columns) with particular labels.

  • iloc gets rows (and/or columns) at integer locations.

To demonstrate, consider a series s of characters with a non-monotonic integer index:

>>> s = pd.Series(list("abcdef"), index=[49, 48, 47, 0, 1, 2]) 
49    a
48    b
47    c
0     d
1     e
2     f

>>> s.loc[0]    # value at index label 0
'd'

>>> s.iloc[0]   # value at index location 0
'a'

>>> s.loc[0:1]  # rows at index labels between 0 and 1 (inclusive)
0    d
1    e

>>> s.iloc[0:1] # rows at index location between 0 and 1 (exclusive)
49    a

Here are some of the differences/similarities between s.loc and s.iloc when passed various objects:

<object> description s.loc[<object>] s.iloc[<object>]
0 single item Value at index label 0 (the string 'd') Value at index location 0 (the string 'a')
0:1 slice Two rows (labels 0 and 1) One row (first row at location 0)
1:47 slice with out-of-bounds end Zero rows (empty Series) Five rows (location 1 onwards)
1:47:-1 slice with negative step three rows (labels 1 back to 47) Zero rows (empty Series)
[2, 0] integer list Two rows with given labels Two rows with given locations
s > 'e' Bool series (indicating which values have the property) One row (containing 'f') NotImplementedError
(s>'e').values Bool array One row (containing 'f') Same as loc
999 int object not in index KeyError IndexError (out of bounds)
-1 int object not in index KeyError Returns last value in s
lambda x: x.index[3] callable applied to series (here returning 3rd item in index) s.loc[s.index[3]] s.iloc[s.index[3]]

loc's label-querying capabilities extend well-beyond integer indexes and it's worth highlighting a couple of additional examples.

Here's a Series where the index contains string objects:

>>> s2 = pd.Series(s.index, index=s.values)
>>> s2
a    49
b    48
c    47
d     0
e     1
f     2

Since loc is label-based, it can fetch the first value in the Series using s2.loc['a']. It can also slice with non-integer objects:

>>> s2.loc['c':'e']  # all rows lying between 'c' and 'e' (inclusive)
c    47
d     0
e     1

For DateTime indexes, we don't need to pass the exact date/time to fetch by label. For example:

>>> s3 = pd.Series(list('abcde'), pd.date_range('now', periods=5, freq='M')) 
>>> s3
2021-01-31 16:41:31.879768    a
2021-02-28 16:41:31.879768    b
2021-03-31 16:41:31.879768    c
2021-04-30 16:41:31.879768    d
2021-05-31 16:41:31.879768    e

Then to fetch the row(s) for March/April 2021 we only need:

>>> s3.loc['2021-03':'2021-04']
2021-03-31 17:04:30.742316    c
2021-04-30 17:04:30.742316    d

Rows and Columns

loc and iloc work the same way with DataFrames as they do with Series. It's useful to note that both methods can address columns and rows together.

When given a tuple, the first element is used to index the rows and, if it exists, the second element is used to index the columns.

Consider the DataFrame defined below:

>>> import numpy as np 
>>> df = pd.DataFrame(np.arange(25).reshape(5, 5),  
                      index=list('abcde'), 
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a   0   1   2   3   4
b   5   6   7   8   9
c  10  11  12  13  14
d  15  16  17  18  19
e  20  21  22  23  24

Then for example:

>>> df.loc['c': , :'z']  # rows 'c' and onwards AND columns up to 'z'
    x   y   z
c  10  11  12
d  15  16  17
e  20  21  22

>>> df.iloc[:, 3]        # all rows, but only the column at index location 3
a     3
b     8
c    13
d    18
e    23

Sometimes we want to mix label and positional indexing methods for the rows and columns, somehow combining the capabilities of loc and iloc.

For example, consider the following DataFrame. How best to slice the rows up to and including 'c' and take the first four columns?

>>> import numpy as np 
>>> df = pd.DataFrame(np.arange(25).reshape(5, 5),  
                      index=list('abcde'), 
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a   0   1   2   3   4
b   5   6   7   8   9
c  10  11  12  13  14
d  15  16  17  18  19
e  20  21  22  23  24

We can achieve this result using iloc and the help of another method:

>>> df.iloc[:df.index.get_loc('c') + 1, :4]
    x   y   z   8
a   0   1   2   3
b   5   6   7   8
c  10  11  12  13

get_loc() is an index method meaning "get the position of the label in this index". Note that since slicing with iloc is exclusive of its endpoint, we must add 1 to this value if we want row 'c' as well.

这篇关于iloc 和 loc 有何不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆