具有时间序列行索引的MultiIndex切片 [英] MultiIndex Slicing with a Timeseries Row Index

查看:526
本文介绍了具有时间序列行索引的MultiIndex切片的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对这个问题使用了答案尝试在我的数据框上制作类似的切片.但这似乎不起作用,因为我的行索引是TimeSeries.我不确定如何改写切片才能正常工作.

I used the answer to this question to try to make a similar slice on my dataframe. But it doesn't seem to work because my row index is a TimeSeries. I am not sure how to rephrase the slice to work.

我正在使用的df具有单个TimeSeries索引,并且列是两级MultiIndex.我试图对任意行返回由每个主要列的"px"子列组成的一系列.

The df I'm using has a single TimeSeries index, and the columns are a two-level MultiIndex. I'm attempting to, for an arbitrary row, to return a series of consisting of the "px" subcolumn of each major column.

首次尝试:df.loc[0,(slice(None), 'px')]引发TypeError,

The first attempt: df.loc[0,(slice(None), 'px')] throws a TypeError,

TypeError: cannot do index indexing on <class 'pandas.tseries.index.DatetimeIndex'> with these indexers [0] of <type 'int'> 

所以我也尝试为它提供索引的DateTime,而不是int:

So I also have tried to feed it a DateTime for the index, instead of an int:

useIndex = sdf.index[0]
return df.loc[useIndex,(slice(None), 'px')]

哪个给出:

KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)' 

后记...

如果我只是简单的话

useIndex = sdf.index [0] useIndex sdf.iloc [useIndex]

useIndex = sdf.index[0] useIndex sdf.iloc[useIndex]

我失败了:

TypeError: cannot do label indexing on <class 'pandas.tseries.index.DatetimeIndex'> with these indexers [2015-10-08 00:00:00] of <class 'pandas.tslib.Timestamp'>

所以也许问题在于我没有真正将有效的索引传递给MultiIndex切片?

So maybe the problem is that I'm not really passing a valid index to the MultiIndex slice?

================

================

这里有两个示例:第一个df('df')我可以提取所需的数据.第二个df('df2')引发类型错误.

Here are two examples: The first df ('df') I'm able to pull out the data I want. The second df, ('df2') throws a Type Error.

import pandas as pd
import numpy as np

cols = [['col_1', 'col_2'], ['delta', 'px']]
multi_idx = pd.MultiIndex.from_product(cols, names= ["level_0", "level_1"])
df = pd.DataFrame(np.random.rand(20).reshape(5, 4), index=range(5), columns=multi_idx)

row_number =1 

print df.loc[df.index[row_number], pd.IndexSlice[:, 'px']]

rng = pd.date_range('1/1/2011', periods=5, freq='H')
df2 = pd.DataFrame(np.random.rand(20).reshape(5, 4), index=rng, columns=multi_idx)

#print df2.loc[df.index[row_number], pd.IndexSlice[:, 'px']]
useIndex = df2.index[0] 

print df2.loc[useIndex, pd.IndexSlice[:, 'px']]

推荐答案

使用IndexSlice应该有助于获得所需的结果.请注意,首先需要对列进行lex排序:

Using IndexSlice should help get your desired results. Note that the columns first need to be lex sorted:

cols = [['col_1', 'col_2'], ['delta', 'px']]
multi_idx = pd.MultiIndex.from_product(cols, names= ["level_0", "level_1"])
df = pd.DataFrame(np.random.rand(20).reshape(5, 4), index=range(5), columns=multi_idx)

>>> df
level_0     col_1               col_2          
level_1     delta        px     delta        px
0        0.891758  0.071693  0.629897  0.693161
1        0.772542  0.022781  0.684584  0.892641
2        0.925957  0.794940  0.146950  0.134798
3        0.159558  0.842898  0.677927  0.028675
4        0.436268  0.989759  0.471879  0.101878

row_number = 3
>>> df.loc[df.index[row_number], pd.IndexSlice[:, 'px']]
level_0  level_1
col_1    px         0.842898
col_2    px         0.028675
Name: 3, dtype: float64

这篇关于具有时间序列行索引的MultiIndex切片的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆