我如何真正使用 Pandas DataFrame 的 `ix` 方法? [英] How do I really use the `ix` method of a pandas DataFrame?
问题描述
注意:自从我问这个问题后,.ix
仍然存在,但或多或少已被 .loc
替换为基于标签的索引和 .iloc
用于位置索引.
Note: since I asked this question, .ix
still exists but more or less has been replaced by .loc
for label-based indexing and .iloc
for positional indexing.
阅读文档之一ix
DataFrames 的方法,我对 MultiIndexed DataFrame 的以下行为(指定索引的选择列)感到有些困惑.
Having read the docs one the ix
method of DataFrames, I'm a bit confused by the following behavior with my MultiIndexed DataFrame (specifying select columns of the index).
In [57]: metals
Out[57]:
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 24245 entries, (u'BI', u'Arsenic, Dissolved', -2083768576.0, 1.0)
to (u'WC', u'Zinc, Total', 1661183104.0, 114.0)
Data columns:
Inflow_val 20648 non-null values
Outflow_val 20590 non-null values
Inflow_qual 20648 non-null values
Outflow_qual 20590 non-null values
dtypes: float64(2), object(2)
In [58]: metals.ix['BI'].shape # first column in the index, ok
Out[58]: (3368, 4)
In [59]: metals.ix['BI', :, :, :].shape # first + other columns, ok
Out[59]: (3368, 4)
In [60]: metals.ix['BI', 'Arsenic, Dissolved'].shape # first two cols
Out[60]: (225, 4)
In [61]: metals.ix['BI', 'Arsenic, Dissolved', :, :].shape # first two + all others
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-62-1fb577ec32fa> in <module>()
----> 1 metals.ix['BI', 'Arsenic, Dissolved', :, :].shape
# traceback spaghetti snipped
KeyError: 'no item named Arsenic, Dissolved'
In [62]: metals.ix['BI', 'Arsenic, Dissolved', :, 1.0].shape # also fails
我花了很长时间才意识到我一直在尝试使用 In [61]
实现的目标可以通过 In [60]
实现.为什么 ix
方法的行为是这样的?我真正想了解的是 In [62]
中的场景.
It took me a long time to realize that what I had been trying to achieve with In [61]
was possible with In [60]
. Why does the ix
method behave like this? What I'm really trying to get at is the scenario at In [62]
.
我的猜测是我需要重新定义索引层次结构,但我很好奇是否有更简单的方法.
My guess is that I need to redefine the index hierarchy, but I'm curious if there's an easier way.
谢谢.
推荐答案
如果您想根据 MultiIndex 级别值选择行/列,我建议使用 '.xs()' 方法.另请参见 从 Pandas 数据框中选择行复合(分层)索引
If you want to select rows/columns based on MultiIndex level values i suggest using the '.xs()' method. See also Selecting rows from a Pandas dataframe with a compound (hierarchical) index
对于这个例子,你可以使用:
For this example, you could use:
#short hand:
metals.xs('BI', level=0).xs('Arsenic, Dissolved', level=0).xs(1, level=1)
# more verbose
metals.xs('BI', level='bmp_category').xs('Arsenic, Dissolved', level='parameter').xs(1, level='storm')
# two chained `ix` calls:
metals.ix['BI', 'Arsenic, Dissolved'].ix[:, 1]
这篇关于我如何真正使用 Pandas DataFrame 的 `ix` 方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!