pandas iloc vs ix vs loc解释? [英] pandas iloc vs ix vs loc explanation?
问题描述
我看过文档,
和我已经看到这些 ,但我仍然发现自己无法解释这三个是不同的。对我来说,它们在很大程度上似乎是可互换的,因为它们处于较低的切片水平。
例如,假设我们要获取前五行的 DataFrame
。这些都是如何工作的?
df.loc [:5]
df.ix [ 5]
df.iloc [:5]
有人可以提出三种情况,区分使用更清晰?
首先,总结:
-
loc
在索引中的标签上工作。 -
iloc
在索引中的位置上工作(因此只能使用整数)。 >
-
ix
通常会尝试像loc
的行为,但回到行为像$ $ c> iloc 如果标签不在索引中。
请注意一些细节可以使 ix
稍微难以使用:
-
如果索引为的整数类型,
ix
只会使用基于标签的索引,而不会退回到基于位置的索引。如果标签不在索引中,则会出现错误。 -
如果索引不包含只有整数,则给定一个整数,
ix
将立即使用基于位置的索引,而不是基于标签的索引。然而,如果ix
被赋予另一种类型(例如字符串),则可以使用基于标签的索引。
为了说明这三种方法之间的差异,系列:
>>> s = pd.Series(np.nan,index = [49,48,47,46,45,1,2,3,4,5])
>>> s
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
然后 s.iloc [:3]
返回前3行(因为它看起来位置)和 s.loc [:3]
返回前8行(因为它看着标签):
>>> s.iloc [:3]
49 NaN
48 NaN
47 NaN
>>> s.loc [:3]
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
2 NaN
3 NaN
>>> s.ix [:3]#整数在索引中,所以s.ix [:3]的工作原理像loc
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
2 NaN
3 NaN
注意事项 s.ix [:3]
返回与 s.loc [:3]
相同的系列,因为它寻找标签,而不是按位置(并且索引是整数类型)。
如果我们尝试使用不在索引中的整数标签怎么办(说 6
)?
这里 s.iloc [:6]
按预期的方式返回系列的前6行。但是, s.loc [:6]
引发了一个KeyError,因为 6
不在索引中。
>>> s.iloc [:6]
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
>>> s.loc [:6]
KeyError:6
>>> s.ix [:6]
KeyError:6
根据上述细微之处, s.ix [:6]
现在引发了一个KeyError,因为它尝试像 loc
一样工作,但找不到索引中的 6
因为我们的索引是整数类型,它不会像 iloc
这样的行为。
但是,我们的索引是混合类型,给定一个整数 ix
将立即表现为 iloc
,而不是提高KeyError: p>
>>> s2 = pd.Series(np.nan,index = ['a','b','c','d','e',1,2,3,4,5])
> >> s2.index.is_mixed()#index是混合类型
True
>>> s2.ix [:6]#表现像iloc给定整数
a NaN
b NaN
c NaN
d NaN
e NaN
1 NaN
请记住, ix
仍然可以接受非整数并表现得像 loc
:
>>> s2.ix [:'c']#表现像loc给定非整数
a NaN
b NaN
c NaN
一般建议:如果您只使用标签进行索引,或者仅使用整数位置进行索引,请坚持 loc
或 iloc
以避免意外的结果。
如果然而,您有一个DataFrame,您想要混合标签和位置索引类型, ix
可以让您这样做:
>>> df = pd.DataFrame(np.arange(25).reshape(5,5),
index = list('abcde'),
columns = ['x','y','z ',8,9])
>>> df
xyz 8 9
a 0 1 2 3 4
b 5 6 7 8 9
c 10 11 12 13 14
d 15 16 17 18 19
e 20 21 22 23 24
使用 ix
,我们可以切片行标签和列按位置(请注意,对于列, ix
默认为基于位置的切片,因为标签 4
不是列名):
>>> df.ix [:'c',:4]
xyz 8
a 0 1 2 3
b 5 6 7 8
c 10 11 12 13
Can someone explain how these three methods of slicing are different?
I've seen the docs,
and I've seen these answers, but I still find myself unable to explain how the three are different. To me, they seem interchangeable in large part, because they are at the lower levels of slicing.
For example, say we want to get the first five rows of a DataFrame
. How is it that all three of these work?
df.loc[:5]
df.ix[:5]
df.iloc[:5]
Can someone present three cases where the distinction in uses are clearer?
First, a recap:
loc
works on labels in the index.iloc
works on the positions in the index (so it only takes integers).ix
usually tries to behave likeloc
but falls back to behaving likeiloc
if the label is not in the index.
It's important to note some subtleties that can make ix
slightly tricky to use:
if the index is of integer type,
ix
will only use label-based indexing and not fall back to position-based indexing. If the label is not in the index, an error is raised.if the index does not contain only integers, then given an integer,
ix
will immediately use position-based indexing rather than label-based indexing. If howeverix
is given another type (e.g. a string), it can use label-based indexing.
To illustrate the differences between the three methods, consider the following Series:
>>> s = pd.Series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>> s
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
Then s.iloc[:3]
returns the first 3 rows (since it looks at the position) and s.loc[:3]
returns the first 8 rows (since it looks at the labels):
>>> s.iloc[:3]
49 NaN
48 NaN
47 NaN
>>> s.loc[:3]
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
2 NaN
3 NaN
>>> s.ix[:3] # the integer is in the index so s.ix[:3] works like loc
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
2 NaN
3 NaN
Notice s.ix[:3]
returns the same Series as s.loc[:3]
since it looks for the label first rather than going by position (and the index is of integer type).
What if we try with an integer label that isn't in the index (say 6
)?
Here s.iloc[:6]
returns the first 6 rows of the Series as expected. However, s.loc[:6]
raises a KeyError since 6
is not in the index.
>>> s.iloc[:6]
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
>>> s.loc[:6]
KeyError: 6
>>> s.ix[:6]
KeyError: 6
As per the subtleties noted above, s.ix[:6]
now raises a KeyError because it tries to work like loc
but can't find a 6
in the index. Because our index is of integer type it doesn't fall back to behaving like iloc
.
If, however, our index was of mixed type, given an integer ix
would behave like iloc
immediately instead of raising a KeyError:
>>> s2 = pd.Series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5])
>>> s2.index.is_mixed() # index is mix of types
True
>>> s2.ix[:6] # behaves like iloc given integer
a NaN
b NaN
c NaN
d NaN
e NaN
1 NaN
Keep in mind that ix
can still accept non-integers and behave like loc
:
>>> s2.ix[:'c'] # behaves like loc given non-integer
a NaN
b NaN
c NaN
General advice: if you're only indexing using labels, or only indexing using integer positions, stick with loc
or iloc
to avoid unexpected results.
If however you have a DataFrame and you want to mix label and positional index types, ix
lets you do this:
>>> df = pd.DataFrame(np.arange(25).reshape(5,5),
index=list('abcde'),
columns=['x','y','z', 8, 9])
>>> df
x y z 8 9
a 0 1 2 3 4
b 5 6 7 8 9
c 10 11 12 13 14
d 15 16 17 18 19
e 20 21 22 23 24
Using ix
, we can slice the rows by label and the columns by position (note that for the columns, ix
default to position-based slicing since the label 4
is not a column name):
>>> df.ix[:'c', :4]
x y z 8
a 0 1 2 3
b 5 6 7 8
c 10 11 12 13
这篇关于 pandas iloc vs ix vs loc解释?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!