Pandas - 根据日期将数据帧拆分为多个数据帧? [英] Pandas - Split dataframe into multiple dataframes based on dates?

查看:37
本文介绍了Pandas - 根据日期将数据帧拆分为多个数据帧?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含多个列和一个日期列的数据框.日期格式为 12/31/15,我已将其设置为日期时间对象.

I have a dataframe with multiple columns along with a date column. The date format is 12/31/15 and I have set it as a datetime object.

我将日期时间列设置为索引,并希望对数据框的每个月执行回归计算.

I set the datetime column as the index and want to perform a regression calculation for each month of the dataframe.

我相信这样做的方法是根据月份将数据帧拆分为多个数据帧,存储到数据帧列表中,然后对列表中的每个数据帧执行回归.

I believe the methodology to do this would be to split the dataframe into multiple dataframes based on month, store into a list of dataframes, then perform regression on each dataframe in the list.

我使用 groupby 成功地按月拆分数据帧,但我不确定如何将 groupby 对象中的每个组正确转换为数据帧,以便能够在其上运行我的回归函数.

I have used groupby which successfully split the dataframe by month, but am unsure how to correctly convert each group in the groupby object into a dataframe to be able to run my regression function on it.

有谁知道如何根据日期将数据帧拆分为多个数据帧,或者有更好的方法来解决我的问题?

Does anyone know how to split a dataframe into multiple dataframes based on date, or a better approach to my problem?

这是我目前编写的代码

import pandas as pd
import numpy as np
import statsmodels.api as sm
from patsy import dmatrices

df = pd.read_csv('data.csv')
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df = df.set_index('date')

# Group dataframe on index by month and year 
# Groupby works, but dmatrices does not 
for df_group in df.groupby(pd.TimeGrouper("M")):
    y,X = dmatrices('value1 ~ value2 + value3', data=df_group,      
    return_type='dataframe')

推荐答案

如果必须循环,则需要在迭代 groupby 对象时解压键和数据帧:

If you must loop, you need to unpack the key and the dataframe when you iterate over a groupby object:

import pandas as pd
import numpy as np
import statsmodels.api as sm
from patsy import dmatrices

df = pd.read_csv('data.csv')
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df = df.set_index('date')

注意这里group_name的使用:

for group_name, df_group in df.groupby(pd.Grouper(freq='M')):
    y,X = dmatrices('value1 ~ value2 + value3', data=df_group,      
    return_type='dataframe')

如果您想避免迭代,请查看 Paul H 的要点(见他的评论),但一个使用 apply 的简单例子是:

If you want to avoid iteration, do have a look at the notebook in Paul H's gist (see his comment), but a simple example of using apply would be:

def do_regression(df_group, ret='outcome'):
    """Apply the function to each group in the data and return one result."""
    y,X = dmatrices('value1 ~ value2 + value3',
                    data=df_group,      
                    return_type='dataframe')
    if ret == 'outcome':
        return y
    else:
        return X

outcome = df.groupby(pd.Grouper(freq='M')).apply(do_regression, ret='outcome')

这篇关于Pandas - 根据日期将数据帧拆分为多个数据帧?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆