“索引器太多"与DataFrame.loc [英] "Too many indexers" with DataFrame.loc
问题描述
我已阅读有关切片机的文档一百万次,但从来没有想过,所以我仍在尝试找出如何使用loc
将DataFrame
与MultiIndex
切成薄片.
I've read the docs about slicers a million times, but have never got my head round it, so I'm still trying to figure out how to use loc
to slice a DataFrame
with a MultiIndex
.
我将从此SO答案中的DataFrame
开始:
I'll start with the DataFrame
from this SO answer:
value
first second third fourth
A0 B0 C1 D0 2
D1 3
C2 D0 6
D1 7
B1 C1 D0 10
D1 11
C2 D0 14
D1 15
A1 B0 C1 D0 18
D1 19
C2 D0 22
D1 23
B1 C1 D0 26
D1 27
C2 D0 30
D1 31
A2 B0 C1 D0 34
D1 35
C2 D0 38
D1 39
B1 C1 D0 42
D1 43
C2 D0 46
D1 47
A3 B0 C1 D0 50
D1 51
C2 D0 54
D1 55
B1 C1 D0 58
D1 59
C2 D0 62
D1 63
要仅选择A0
和C1
值,我可以这样做:
To select just A0
and C1
values, I can do:
In [26]: df.loc['A0', :, 'C1', :]
Out[26]:
value
first second third fourth
A0 B0 C1 D0 2
D1 3
B1 C1 D0 10
D1 11
哪一个也可以从三个级别甚至是元组中进行选择:
Which also works selecting from three levels, and even with tuples:
In [28]: df.loc['A0', :, ('C1', 'C2'), 'D1']
Out[28]:
value
first second third fourth
A0 B0 C1 D1 3
C2 D1 5
B1 C1 D1 11
C2 D1 13
到目前为止,直观而辉煌.
So far, intuitive and brilliant.
那我为什么不能从第一个索引级别选择所有值?
So why can't I select all values from the first index level?
In [30]: df.loc[:, :, 'C1', :]
---------------------------------------------------------------------------
IndexingError Traceback (most recent call last)
<ipython-input-30-57b56108d941> in <module>()
----> 1 df.loc[:, :, 'C1', :]
/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in __getitem__(self, key)
1176 def __getitem__(self, key):
1177 if type(key) is tuple:
-> 1178 return self._getitem_tuple(key)
1179 else:
1180 return self._getitem_axis(key, axis=0)
/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _getitem_tuple(self, tup)
694
695 # no multi-index, so validate all of the indexers
--> 696 self._has_valid_tuple(tup)
697
698 # ugly hack for GH #836
/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _has_valid_tuple(self, key)
125 for i, k in enumerate(key):
126 if i >= self.obj.ndim:
--> 127 raise IndexingError('Too many indexers')
128 if not self._has_valid_type(k, i):
129 raise ValueError("Location based indexing can only have [%s] "
IndexingError: Too many indexers
这肯定不是预期的行为吗?
Surely this is not intended behaviour?
注意:我知道这对于df.xs('C1', level='third')
是可行的,但是当前的.loc
行为似乎不一致.
Note: I know this is possible with df.xs('C1', level='third')
but the current .loc
behaviour seems inconsistent.
推荐答案
The reason this doesn't work is tied to the need to specify the axis of indexing (mentioned in http://pandas.pydata.org/pandas-docs/stable/advanced.html). An alternative solution to your problem is to simply do this:
df.loc(axis=0)[:, :, 'C1', :]
当索引相似或包含相似值时,熊猫有时会感到困惑.如果您要使用名为"C1"的列或其他名称,则还需要使用这种切片/选择样式.
Pandas gets confused sometimes when indexes are similar or contain similar values. If you were to have a column named 'C1' or something you would also need to do this under this style of slicing/selecting.
这篇关于“索引器太多"与DataFrame.loc的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!