如何使用索引遍历 pandas 多索引数据框 [英] How to iterate over pandas multiindex dataframe using index

查看:89
本文介绍了如何使用索引遍历 pandas 多索引数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据帧df.日期和时间是2个多级索引

                           observation1   observation2
date          Time                             
2012-11-02    9:15:00      79.373668      224
              9:16:00      130.841316     477
2012-11-03    9:15:00      45.312814      835
              9:16:00      123.776946     623
              9:17:00      153.76646      624
              9:18:00      463.276946     626
              9:19:00      663.176934     622
              9:20:00      763.77333      621
2012-11-04    9:15:00      115.449437     122
              9:16:00      123.776946     555
              9:17:00      153.76646      344
              9:18:00      463.276946     212

我想对每日数据块进行一些复杂的处理.

伪代码看起来像

 for count in df(level 0 index) :
     new_df = get only chunk for count
     complex_process(new_df)

所以,首先,我找不到一种只能访问日期块的方法

2012-11-03    9:15:00      45.312814      835
              9:16:00      123.776946     623
              9:17:00      153.76646      624
              9:18:00      463.276946     626
              9:19:00      663.176934     622
              9:20:00      763.77333      621

,然后将其发送进行处理.我在for循环中执行此操作,因为我不确定是否有任何方法可以在不提及级别0列的确切值的情况下进行操作.我进行了一些基本搜索,并能够获得df.index.get_level_values(0),但是它返回了我所有的值,这导致循环一天运行多次.我想每天创建一个数据框并将其发送进行处理.

解决方案

一种简单的方法是对索引的第一级进行分组-遍历groupby对象将返回组密钥和包含每个组的子帧.

>

In [136]: for date, new_df in df.groupby(level=0):
     ...:     print(new_df)
     ...:     
                    observation1  observation2
date       Time                               
2012-11-02 9:15:00     79.373668           224
           9:16:00    130.841316           477

                    observation1  observation2
date       Time                               
2012-11-03 9:15:00     45.312814           835
           9:16:00    123.776946           623
           9:17:00    153.766460           624
           9:18:00    463.276946           626
           9:19:00    663.176934           622
           9:20:00    763.773330           621

                    observation1  observation2
date       Time                               
2012-11-04 9:15:00    115.449437           122
           9:16:00    123.776946           555
           9:17:00    153.766460           344
           9:18:00    463.276946           212

I have a data frame df which looks like this. Date and Time are 2 multilevel index

                           observation1   observation2
date          Time                             
2012-11-02    9:15:00      79.373668      224
              9:16:00      130.841316     477
2012-11-03    9:15:00      45.312814      835
              9:16:00      123.776946     623
              9:17:00      153.76646      624
              9:18:00      463.276946     626
              9:19:00      663.176934     622
              9:20:00      763.77333      621
2012-11-04    9:15:00      115.449437     122
              9:16:00      123.776946     555
              9:17:00      153.76646      344
              9:18:00      463.276946     212

I want to have do some complex process over daily data block.

Psuedo code would look like

 for count in df(level 0 index) :
     new_df = get only chunk for count
     complex_process(new_df)

So, first of all, I could not find a way to access only blocks for a date

2012-11-03    9:15:00      45.312814      835
              9:16:00      123.776946     623
              9:17:00      153.76646      624
              9:18:00      463.276946     626
              9:19:00      663.176934     622
              9:20:00      763.77333      621

and then send it for processing. I am doing this in for loop as I am not sure if there is any way to do it without mentioning exact value of level 0 column. I did some basic search and able to get df.index.get_level_values(0), but it returns me all the values and that causes loop to run multiple times for a day. I want to create a dataframe per day and send it for processing.

解决方案

One easy way would be to groupby the first level of the index - iterating over the groupby object will return the group keys and a subframe containing each group.

In [136]: for date, new_df in df.groupby(level=0):
     ...:     print(new_df)
     ...:     
                    observation1  observation2
date       Time                               
2012-11-02 9:15:00     79.373668           224
           9:16:00    130.841316           477

                    observation1  observation2
date       Time                               
2012-11-03 9:15:00     45.312814           835
           9:16:00    123.776946           623
           9:17:00    153.766460           624
           9:18:00    463.276946           626
           9:19:00    663.176934           622
           9:20:00    763.773330           621

                    observation1  observation2
date       Time                               
2012-11-04 9:15:00    115.449437           122
           9:16:00    123.776946           555
           9:17:00    153.766460           344
           9:18:00    463.276946           212

这篇关于如何使用索引遍历 pandas 多索引数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆