“索引器太多"与DataFrame.loc [英] "Too many indexers" with DataFrame.loc

查看:114
本文介绍了“索引器太多"与DataFrame.loc的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已阅读有关切片机的文档一百万次,但从来没有想过,所以我仍在尝试找出如何使用locDataFrameMultiIndex切成薄片.

I've read the docs about slicers a million times, but have never got my head round it, so I'm still trying to figure out how to use loc to slice a DataFrame with a MultiIndex.

我将从此SO答案中的DataFrame开始:

I'll start with the DataFrame from this SO answer:

                           value
first second third fourth       
A0    B0     C1    D0          2
                   D1          3
             C2    D0          6
                   D1          7
      B1     C1    D0         10
                   D1         11
             C2    D0         14
                   D1         15
A1    B0     C1    D0         18
                   D1         19
             C2    D0         22
                   D1         23
      B1     C1    D0         26
                   D1         27
             C2    D0         30
                   D1         31
A2    B0     C1    D0         34
                   D1         35
             C2    D0         38
                   D1         39
      B1     C1    D0         42
                   D1         43
             C2    D0         46
                   D1         47
A3    B0     C1    D0         50
                   D1         51
             C2    D0         54
                   D1         55
      B1     C1    D0         58
                   D1         59
             C2    D0         62
                   D1         63

要仅选择A0C1值,我可以这样做:

To select just A0 and C1 values, I can do:

In [26]: df.loc['A0', :, 'C1', :]
Out[26]: 
                           value
first second third fourth       
A0    B0     C1    D0          2
                   D1          3
      B1     C1    D0         10
                   D1         11

哪一个也可以从三个级别甚至是元组中进行选择:

Which also works selecting from three levels, and even with tuples:

In [28]: df.loc['A0', :, ('C1', 'C2'), 'D1']
Out[28]: 
                           value
first second third fourth       
A0    B0     C1    D1          3
             C2    D1          5
      B1     C1    D1         11
             C2    D1         13

到目前为止,直观而辉煌.

So far, intuitive and brilliant.

那我为什么不能从第一个索引级别选择所有值?

So why can't I select all values from the first index level?

In [30]: df.loc[:, :, 'C1', :]
---------------------------------------------------------------------------
IndexingError                             Traceback (most recent call last)
<ipython-input-30-57b56108d941> in <module>()
----> 1 df.loc[:, :, 'C1', :]

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in __getitem__(self, key)
   1176     def __getitem__(self, key):
   1177         if type(key) is tuple:
-> 1178             return self._getitem_tuple(key)
   1179         else:
   1180             return self._getitem_axis(key, axis=0)

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _getitem_tuple(self, tup)
    694 
    695         # no multi-index, so validate all of the indexers
--> 696         self._has_valid_tuple(tup)
    697 
    698         # ugly hack for GH #836

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _has_valid_tuple(self, key)
    125         for i, k in enumerate(key):
    126             if i >= self.obj.ndim:
--> 127                 raise IndexingError('Too many indexers')
    128             if not self._has_valid_type(k, i):
    129                 raise ValueError("Location based indexing can only have [%s] "

IndexingError: Too many indexers

这肯定不是预期的行为吗?

Surely this is not intended behaviour?

注意:我知道这对于df.xs('C1', level='third')是可行的,但是当前的.loc行为似乎不一致.

Note: I know this is possible with df.xs('C1', level='third') but the current .loc behaviour seems inconsistent.

推荐答案

不起作用的原因与指定索引轴有关(在

The reason this doesn't work is tied to the need to specify the axis of indexing (mentioned in http://pandas.pydata.org/pandas-docs/stable/advanced.html). An alternative solution to your problem is to simply do this:

df.loc(axis=0)[:, :, 'C1', :]

当索引相似或包含相似值时,熊猫有时会感到困惑.如果您要使用名为"C1"的列或其他名称,则还需要使用这种切片/选择样式.

Pandas gets confused sometimes when indexes are similar or contain similar values. If you were to have a column named 'C1' or something you would also need to do this under this style of slicing/selecting.

这篇关于“索引器太多"与DataFrame.loc的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆