iloc和loc有何不同? [英] How are iloc and loc different?
问题描述
有人可以解释这两种切片方法有何不同吗?
我见过文档,
,我已经看到这些 ,但我仍然发现自己无法解释这三者之间的区别。在我看来,它们在很大程度上似乎是可互换的,因为它们处于切片的较低级别。
Can someone explain how these two methods of slicing are different?
I've seen the docs,
and I've seen these answers, but I still find myself unable to explain how the three are different. To me, they seem interchangeable in large part, because they are at the lower levels of slicing.
例如,假设我们要获得 DataFrame
。这两个怎么工作?
For example, say we want to get the first five rows of a DataFrame
. How is it that these two work?
df.loc[:5]
df.iloc[:5]
有人可以提出三种情况下用途区别更清楚的情况吗?
Can someone present three cases where the distinction in uses are clearer?
从前,我还想知道这两个函数与 df.ix [:5]
但 ix $有何不同c $ c>已从熊猫1.0中删除,所以我不在乎!
Once upon a time, I also wanted to know how these two functions differ from df.ix[:5]
but ix
has been removed from pandas 1.0, so I don't care anymore!
推荐答案
注意: pandas版本0.20.0及更高版本, ix
是已弃用,并使用 loc
和 iloc $ c $鼓励使用c>。我留下了完整描述
的替代方案。 ix
的部分内容,作为早期熊猫用户的参考。下面添加了示例,显示了 ix
Note: in pandas version 0.20.0 and above, ix
is deprecated and the use of loc
and iloc
is encouraged instead. I have left the parts of this answer that describe ix
intact as a reference for users of earlier versions of pandas. Examples have been added below showing alternatives to ix
.
首先,以下是这三种方法的回顾:
First, here's a recap of the three methods:
-
loc
获取索引中带有特定标签的行(或列)。 -
iloc
获取索引中特定位置处的行(或列)(因此它只需要整数) )。 -
ix
通常试图表现得像loc
,但回落到如果标签不存在于索引中,则其行为类似于iloc
。
loc
gets rows (or columns) with particular labels from the index.iloc
gets rows (or columns) at particular positions in the index (so it only takes integers).ix
usually tries to behave likeloc
but falls back to behaving likeiloc
if a label is not present in the index.
重要的是要注意一些会使 ix
难以使用的微妙之处:
It's important to note some subtleties that can make ix
slightly tricky to use:
-
如果索引是整数类型,则
ix
将仅使用基于标签的索引,而不会使用基于位置的索引。如果标签不在索引中,则会引发错误。
if the index is of integer type,
ix
will only use label-based indexing and not fall back to position-based indexing. If the label is not in the index, an error is raised.
如果索引不包含仅 个整数,则给出整数 ix
将立即使用基于位置的索引,而不是基于标签的索引。但是,如果 ix
被赋予了另一种类型(例如字符串),则它可以使用基于标签的索引。
if the index does not contain only integers, then given an integer, ix
will immediately use position-based indexing rather than label-based indexing. If however ix
is given another type (e.g. a string), it can use label-based indexing.
为说明这三种方法之间的区别,请考虑以下系列:
To illustrate the differences between the three methods, consider the following Series:
>>> s = pd.Series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>> s
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
我们来看用整数值切片 3
。
We'll look at slicing with the integer value 3
.
在这种情况下, s.iloc [:3]
返回我们的前3行(因为它会将3视为头寸),而 s.loc [:3]
返回我们的前8行(因为它对待了3作为标签):
In this case, s.iloc[:3]
returns us the first 3 rows (since it treats 3 as a position) and s.loc[:3]
returns us the first 8 rows (since it treats 3 as a label):
>>> s.iloc[:3] # slice the first three rows
49 NaN
48 NaN
47 NaN
>>> s.loc[:3] # slice up to and including label 3
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
2 NaN
3 NaN
>>> s.ix[:3] # the integer is in the index so s.ix[:3] works like loc
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
2 NaN
3 NaN
注意 s.ix [:3]
返回与 s.loc [:3]
相同的系列首先查找标签而不是在位置上工作( s
的索引是整数类型。)
Notice s.ix[:3]
returns the same Series as s.loc[:3]
since it looks for the label first rather than working on the position (and the index for s
is of integer type).
如果我们尝试使用不在索引中的整数标签(例如 6
)怎么办?
What if we try with an integer label that isn't in the index (say 6
)?
这里 s.iloc [:6]
返回该系列的前6行。但是,由于 6
不在索引中,因此 s.loc [:6]
会引发KeyError。
Here s.iloc[:6]
returns the first 6 rows of the Series as expected. However, s.loc[:6]
raises a KeyError since 6
is not in the index.
>>> s.iloc[:6]
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
>>> s.loc[:6]
KeyError: 6
>>> s.ix[:6]
KeyError: 6
根据上述注意事项, s.ix [:6]
现在会引发KeyError,因为它试图像 loc
一样工作,但是找不到索引中的 6
。因为我们的索引是整数类型 ix
不会像 iloc
那样表现。
As per the subtleties noted above, s.ix[:6]
now raises a KeyError because it tries to work like loc
but can't find a 6
in the index. Because our index is of integer type ix
doesn't fall back to behaving like iloc
.
但是,如果我们的索引为混合类型,则给定整数 ix
的行为类似于 iloc
,而不是引发KeyError:
If, however, our index was of mixed type, given an integer ix
would behave like iloc
immediately instead of raising a KeyError:
>>> s2 = pd.Series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5])
>>> s2.index.is_mixed() # index is mix of different types
True
>>> s2.ix[:6] # now behaves like iloc given integer
a NaN
b NaN
c NaN
d NaN
e NaN
1 NaN
请记住, ix
仍然可以接受非整数并表现出例如 loc
:
Keep in mind that ix
can still accept non-integers and behave like loc
:
>>> s2.ix[:'c'] # behaves like loc given non-integer
a NaN
b NaN
c NaN
作为一般建议,如果仅使用标签建立索引,或仅使用整数位置建立索引,请坚持使用 loc
或 iloc
以避免意外的结果-尽量不要使用 ix
。
As general advice, if you're only indexing using labels, or only indexing using integer positions, stick with loc
or iloc
to avoid unexpected results - try not use ix
.
有时给定一个DataFrame,您可能想混合使用标签和位置索引方法
Sometimes given a DataFrame, you will want to mix label and positional indexing methods for the rows and columns.
例如,考虑以下DataFrame。如何最好地将行划分为包括前三列的'c'和?
For example, consider the following DataFrame. How best to slice the rows up to and including 'c' and take the first four columns?
>>> df = pd.DataFrame(np.nan,
index=list('abcde'),
columns=['x','y','z', 8, 9])
>>> df
x y z 8 9
a NaN NaN NaN NaN NaN
b NaN NaN NaN NaN NaN
c NaN NaN NaN NaN NaN
d NaN NaN NaN NaN NaN
e NaN NaN NaN NaN NaN
在较早版本的熊猫(0.20.0之前)中 ix
使您可以整齐地执行此操作-我们可以按标签对行进行切片,按位置对列进行切片(请注意,对于列, ix
将默认为position的切片,因为 4
不是列名):
In earlier versions of pandas (before 0.20.0) ix
lets you do this quite neatly - we can slice the rows by label and the columns by position (note that for the columns, ix
will default to position-based slicing since 4
is not a column name):
>>> df.ix[:'c', :4]
x y z 8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN
在更高版本的熊猫中,我们可以使用 iloc
和另一种方法:
In later versions of pandas, we can achieve this result using iloc
and the help of another method:
>>> df.iloc[:df.index.get_loc('c') + 1, :4]
x y z 8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN
get_loc()
是一种索引方法,意思是获取以下位置此索引中的标签。请注意,由于使用 iloc
进行切片不包含其端点,因此,如果我们还希望行'c',则必须在该值上加1。
get_loc()
is an index method meaning "get the position of the label in this index". Note that since slicing with iloc
is exclusive of its endpoint, we must add 1 to this value if we want row 'c' as well.
pandas文档中有其他示例此处。
There are further examples in pandas' documentation here.
这篇关于iloc和loc有何不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!