iloc 和 loc 有何不同? [英] How are iloc and loc different?
问题描述
谁能解释一下这两种切片方法有什么不同?
我看过文档,我看过这些 答案,但我仍然发现自己无法理解这三者有何不同.对我来说,它们在很大程度上似乎可以互换,因为它们处于较低的切片级别.
例如,假设我们想要获取 DataFrame
的前五行.这两个是如何工作的?
df.loc[:5]df.iloc[:5]
谁能举出三种用法区别比较明显的案例?
曾几何时,我也想知道这两个函数与 df.ix[:5]
有何不同,但是 ix
已从 pandas 1.0 中删除,所以我不在乎了.
标签 vs. 位置
这两种方法的主要区别是:
loc
获取带有特定标签的行(和/或列).iloc
在整数位置处获取行(和/或列).
为了演示,考虑一系列具有非单调整数索引的s
字符:
以下是在传递各种对象时 s.loc
和 s.iloc
之间的一些差异/相似之处:
<对象> | 描述 | s.loc[ | s.iloc[ |
---|---|---|---|
0 | 单项 | 索引处的值label 0 (字符串'd' ) | 索引 location 0 处的值(字符串 'a' ) |
0:1 | 切片 | 两行(标签0 和1 ) | 一行(位置 0 的第一行) |
1:47 | 越界结束的切片 | 零行(空系列) | 五行(位置 1 以上) |
1:47:-1 | 负步切片 | 三行(标签1 回到47 ) | 零行(空系列) |
[2, 0] | 整数列表 | 两行给定标签 | 两行给定位置 |
s >'e' | 布尔系列(指示哪些值具有属性) | 一行(包含'f' ) | NotImplementedError |
(s>'e').values | 布尔数组 | 一行(包含'f' ) | 同loc |
999 | int 对象不在索引中 | KeyError | IndexError (越界) |
-1 | int 对象不在索引中 | KeyError | 返回s | 中的最后一个值
lambda x: x.index[3] | 可调用应用于系列(此处返回索引中的第 3 个rd 项) | s.loc[s.index[3]] | s.iloc[s.index[3]] |
loc
的标签查询功能远远超出了整数索引,值得强调几个额外的例子.
这是一个索引包含字符串对象的系列:
<预><代码>>>>s2 = pd.Series(s.index, index=s.values)>>>s2一个 494847012由于 loc
是基于标签的,它可以使用 s2.loc['a']
获取系列中的第一个值.它还可以使用非整数对象切片:
对于 DateTime 索引,我们不需要传递确切的日期/时间来按标签获取.例如:
<预><代码>>>>s3 = pd.Series(list('abcde'), pd.date_range('now', period=5, freq='M'))>>>s32021-01-31 16:41:31.8797682021-02-28 16:41:31.879768 b2021-03-31 16:41:31.879768 c2021-04-30 16:41:31.879768 d2021-05-31 16:41:31.879768 e然后要获取 2021 年 3 月/4 月的行,我们只需要:
<预><代码>>>>s3.loc['2021-03':'2021-04']2021-03-31 17:04:30.742316 c2021-04-30 17:04:30.742316 d行和列
loc
和 iloc
处理 DataFrame 的方式与处理系列的方式相同.需要注意的是,这两种方法都可以同时处理列和行.
当给定一个元组时,第一个元素用于索引行,如果存在,第二个元素用于索引列.
考虑下面定义的 DataFrame:
<预><代码>>>>将 numpy 导入为 np>>>df = pd.DataFrame(np.arange(25).reshape(5, 5),索引=列表('abcde'),列=['x','y','z', 8, 9])>>>dfxy z 8 90 1 2 3 45 6 7 8 910 11 12 13 14d 15 16 17 18 1920 21 22 23 24然后例如:
<预><代码>>>>df.loc['c': , :'z'] # 行 'c' 及以后的 AND 列直到 'z'xyz10 11 1215 16 1720 21 22>>>df.iloc[:, 3] # 所有行,但只有索引位置 3 处的列一个 3813d 1823有时我们想为行和列混合标签和位置索引方法,以某种方式结合 loc
和 iloc
的功能.
例如,考虑以下 DataFrame.如何最好地将行切片到并包括c"和取前四列?
<预><代码>>>>将 numpy 导入为 np>>>df = pd.DataFrame(np.arange(25).reshape(5, 5),索引=列表('abcde'),列=['x','y','z', 8, 9])>>>dfxy z 8 90 1 2 3 45 6 7 8 910 11 12 13 14d 15 16 17 18 1920 21 22 23 24我们可以使用 iloc
和另一种方法的帮助来实现这个结果:
get_loc()
是一种索引方法,意思是获取标签在该索引中的位置".请注意,由于使用 iloc
进行切片不包括其端点,如果我们还想要行 'c',我们必须向该值加 1.
Can someone explain how these two methods of slicing are different?
I've seen the docs,
and I've seen these answers, but I still find myself unable to understand how the three are different. To me, they seem interchangeable in large part, because they are at the lower levels of slicing.
For example, say we want to get the first five rows of a DataFrame
. How is it that these two work?
df.loc[:5]
df.iloc[:5]
Can someone present three cases where the distinction in uses are clearer?
Once upon a time, I also wanted to know how these two functions differ from df.ix[:5]
but ix
has been removed from pandas 1.0, so I don't care anymore.
Label vs. Location
The main distinction between the two methods is:
loc
gets rows (and/or columns) with particular labels.iloc
gets rows (and/or columns) at integer locations.
To demonstrate, consider a series s
of characters with a non-monotonic integer index:
>>> s = pd.Series(list("abcdef"), index=[49, 48, 47, 0, 1, 2])
49 a
48 b
47 c
0 d
1 e
2 f
>>> s.loc[0] # value at index label 0
'd'
>>> s.iloc[0] # value at index location 0
'a'
>>> s.loc[0:1] # rows at index labels between 0 and 1 (inclusive)
0 d
1 e
>>> s.iloc[0:1] # rows at index location between 0 and 1 (exclusive)
49 a
Here are some of the differences/similarities between s.loc
and s.iloc
when passed various objects:
<object> | description | s.loc[<object>] |
s.iloc[<object>] |
---|---|---|---|
0 |
single item | Value at index label 0 (the string 'd' ) |
Value at index location 0 (the string 'a' ) |
0:1 |
slice | Two rows (labels 0 and 1 ) |
One row (first row at location 0) |
1:47 |
slice with out-of-bounds end | Zero rows (empty Series) | Five rows (location 1 onwards) |
1:47:-1 |
slice with negative step | three rows (labels 1 back to 47 ) |
Zero rows (empty Series) |
[2, 0] |
integer list | Two rows with given labels | Two rows with given locations |
s > 'e' |
Bool series (indicating which values have the property) | One row (containing 'f' ) |
NotImplementedError |
(s>'e').values |
Bool array | One row (containing 'f' ) |
Same as loc |
999 |
int object not in index | KeyError |
IndexError (out of bounds) |
-1 |
int object not in index | KeyError |
Returns last value in s |
lambda x: x.index[3] |
callable applied to series (here returning 3rd item in index) | s.loc[s.index[3]] |
s.iloc[s.index[3]] |
loc
's label-querying capabilities extend well-beyond integer indexes and it's worth highlighting a couple of additional examples.
Here's a Series where the index contains string objects:
>>> s2 = pd.Series(s.index, index=s.values)
>>> s2
a 49
b 48
c 47
d 0
e 1
f 2
Since loc
is label-based, it can fetch the first value in the Series using s2.loc['a']
. It can also slice with non-integer objects:
>>> s2.loc['c':'e'] # all rows lying between 'c' and 'e' (inclusive)
c 47
d 0
e 1
For DateTime indexes, we don't need to pass the exact date/time to fetch by label. For example:
>>> s3 = pd.Series(list('abcde'), pd.date_range('now', periods=5, freq='M'))
>>> s3
2021-01-31 16:41:31.879768 a
2021-02-28 16:41:31.879768 b
2021-03-31 16:41:31.879768 c
2021-04-30 16:41:31.879768 d
2021-05-31 16:41:31.879768 e
Then to fetch the row(s) for March/April 2021 we only need:
>>> s3.loc['2021-03':'2021-04']
2021-03-31 17:04:30.742316 c
2021-04-30 17:04:30.742316 d
Rows and Columns
loc
and iloc
work the same way with DataFrames as they do with Series. It's useful to note that both methods can address columns and rows together.
When given a tuple, the first element is used to index the rows and, if it exists, the second element is used to index the columns.
Consider the DataFrame defined below:
>>> import numpy as np
>>> df = pd.DataFrame(np.arange(25).reshape(5, 5),
index=list('abcde'),
columns=['x','y','z', 8, 9])
>>> df
x y z 8 9
a 0 1 2 3 4
b 5 6 7 8 9
c 10 11 12 13 14
d 15 16 17 18 19
e 20 21 22 23 24
Then for example:
>>> df.loc['c': , :'z'] # rows 'c' and onwards AND columns up to 'z'
x y z
c 10 11 12
d 15 16 17
e 20 21 22
>>> df.iloc[:, 3] # all rows, but only the column at index location 3
a 3
b 8
c 13
d 18
e 23
Sometimes we want to mix label and positional indexing methods for the rows and columns, somehow combining the capabilities of loc
and iloc
.
For example, consider the following DataFrame. How best to slice the rows up to and including 'c' and take the first four columns?
>>> import numpy as np
>>> df = pd.DataFrame(np.arange(25).reshape(5, 5),
index=list('abcde'),
columns=['x','y','z', 8, 9])
>>> df
x y z 8 9
a 0 1 2 3 4
b 5 6 7 8 9
c 10 11 12 13 14
d 15 16 17 18 19
e 20 21 22 23 24
We can achieve this result using iloc
and the help of another method:
>>> df.iloc[:df.index.get_loc('c') + 1, :4]
x y z 8
a 0 1 2 3
b 5 6 7 8
c 10 11 12 13
get_loc()
is an index method meaning "get the position of the label in this index". Note that since slicing with iloc
is exclusive of its endpoint, we must add 1 to this value if we want row 'c' as well.
这篇关于iloc 和 loc 有何不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!