如何使用索引遍历 pandas 多索引数据框 [英] How to iterate over pandas multiindex dataframe using index
问题描述
我有一个看起来像这样的数据帧df.日期和时间是2个多级索引
observation1 observation2
date Time
2012-11-02 9:15:00 79.373668 224
9:16:00 130.841316 477
2012-11-03 9:15:00 45.312814 835
9:16:00 123.776946 623
9:17:00 153.76646 624
9:18:00 463.276946 626
9:19:00 663.176934 622
9:20:00 763.77333 621
2012-11-04 9:15:00 115.449437 122
9:16:00 123.776946 555
9:17:00 153.76646 344
9:18:00 463.276946 212
我想对每日数据块进行一些复杂的处理.
伪代码看起来像
for count in df(level 0 index) :
new_df = get only chunk for count
complex_process(new_df)
所以,首先,我找不到一种只能访问日期块的方法
2012-11-03 9:15:00 45.312814 835
9:16:00 123.776946 623
9:17:00 153.76646 624
9:18:00 463.276946 626
9:19:00 663.176934 622
9:20:00 763.77333 621
,然后将其发送进行处理.我在for循环中执行此操作,因为我不确定是否有任何方法可以在不提及级别0列的确切值的情况下进行操作.我进行了一些基本搜索,并能够获得df.index.get_level_values(0),但是它返回了我所有的值,这导致循环一天运行多次.我想每天创建一个数据框并将其发送进行处理.
一种简单的方法是对索引的第一级进行分组-遍历groupby对象将返回组密钥和包含每个组的子帧.
>
In [136]: for date, new_df in df.groupby(level=0):
...: print(new_df)
...:
observation1 observation2
date Time
2012-11-02 9:15:00 79.373668 224
9:16:00 130.841316 477
observation1 observation2
date Time
2012-11-03 9:15:00 45.312814 835
9:16:00 123.776946 623
9:17:00 153.766460 624
9:18:00 463.276946 626
9:19:00 663.176934 622
9:20:00 763.773330 621
observation1 observation2
date Time
2012-11-04 9:15:00 115.449437 122
9:16:00 123.776946 555
9:17:00 153.766460 344
9:18:00 463.276946 212
I have a data frame df which looks like this. Date and Time are 2 multilevel index
observation1 observation2
date Time
2012-11-02 9:15:00 79.373668 224
9:16:00 130.841316 477
2012-11-03 9:15:00 45.312814 835
9:16:00 123.776946 623
9:17:00 153.76646 624
9:18:00 463.276946 626
9:19:00 663.176934 622
9:20:00 763.77333 621
2012-11-04 9:15:00 115.449437 122
9:16:00 123.776946 555
9:17:00 153.76646 344
9:18:00 463.276946 212
I want to have do some complex process over daily data block.
Psuedo code would look like
for count in df(level 0 index) :
new_df = get only chunk for count
complex_process(new_df)
So, first of all, I could not find a way to access only blocks for a date
2012-11-03 9:15:00 45.312814 835
9:16:00 123.776946 623
9:17:00 153.76646 624
9:18:00 463.276946 626
9:19:00 663.176934 622
9:20:00 763.77333 621
and then send it for processing. I am doing this in for loop as I am not sure if there is any way to do it without mentioning exact value of level 0 column. I did some basic search and able to get df.index.get_level_values(0), but it returns me all the values and that causes loop to run multiple times for a day. I want to create a dataframe per day and send it for processing.
One easy way would be to groupby the first level of the index - iterating over the groupby object will return the group keys and a subframe containing each group.
In [136]: for date, new_df in df.groupby(level=0):
...: print(new_df)
...:
observation1 observation2
date Time
2012-11-02 9:15:00 79.373668 224
9:16:00 130.841316 477
observation1 observation2
date Time
2012-11-03 9:15:00 45.312814 835
9:16:00 123.776946 623
9:17:00 153.766460 624
9:18:00 463.276946 626
9:19:00 663.176934 622
9:20:00 763.773330 621
observation1 observation2
date Time
2012-11-04 9:15:00 115.449437 122
9:16:00 123.776946 555
9:17:00 153.766460 344
9:18:00 463.276946 212
这篇关于如何使用索引遍历 pandas 多索引数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!