pandas - 根据日期将数据分割成多个数据框？ [英] Pandas - Split dataframe into multiple dataframes based on dates?

查看：403 发布时间：2017/4/15 15:21:56 python datetime pandas group-by

本文介绍了 pandas - 根据日期将数据分割成多个数据框？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含多个列的数据框以及一个日期列。日期格式为12/31/15，我将其设置为datetime对象。

我将datetime列设置为索引，并希望对每个月的数据帧。

我认为这样做的方法是将数据帧分为几个数据帧，基于月份，存储到数据框列表中，然后在每个数据帧中执行回归列表。

我已经使用了groupby，它按月成功地分割数据帧，但不确定如何将groupby对象中的每个组正确转换为数据帧，以便能够运行我的回归

有谁知道如何根据日期将数据框分割成多个数据框，或者更好地解决我的问题？

这是我迄今为止编写的代码

 导入熊猫为pd 
导入numpy as np 
 import statsmodels.api as sm 
 from patsy import dmatrices 
 
 df = pd.read_csv（'data.csv'）
 df ['date' ] = pd.to_datetime（df ['date']，format ='％Y％m％d'）
 df = df.set_index（'date'）
 
＃组数据帧按月份和年份索引
＃Groupby工作，但df.groupby（pd.TimeGrouper（M））中df_group的dmatrices不
：
y，X = dmatrices（'value1〜 value2 + value3'，data = df_group，
 return_type ='dataframe'）

解决方案

如果您必须循环，则当您迭代 groupby object：

 将pandas导入pd 
导入numpy作为np 
 import statsmodels.api作为sm 
从patsy导入dmatrices 
 
 df = pd.read_csv（'data.csv'）
 df ['date'] = pd.to_datetime（df ['date' ]，format ='％Y％m％d'）
 df = df.set_index（'date'）

请注意使用 group_name 这里：

 对于group_name，df.groupby中的df_group（pd.TimeGrouper（M））：
y，X = dmatrices（'value1〜value2 + value3'，data = df_group，
 return_type ='dataframe'）

如果你想避免迭代，请仔细阅读 Paul H的主旨（见他的评论），但一个简单的考试使用申请将是：

  def do_regression（df_group，ret ='result'）：
将该函数应用于数据中的每个组并返回一个结果。
y，X = dmatrices（'value1〜value2 + value3'，
 data = df_group，
 return_type ='dataframe'）
 if ret =='outcome'：
 return y 
 else：
 return X 
 
 outcome = df.groupby（pd.TimeGrouper（M））apply（do_regression，ret ='outcome'）

I have a dataframe with multiple columns along with a date column. The date format is 12/31/15 and I have set it as a datetime object.



I set the datetime column as the index and want to perform a regression calculation for each month of the dataframe. 

I believe the methodology to do this would be to split the dataframe into multiple dataframes based on month, store into a list of dataframes, then perform regression on each dataframe in the list. 

I have used groupby which successfully split the dataframe by month, but am unsure how to correctly convert each group in the groupby object into a dataframe to be able to run my regression function on it.

Does anyone know how to split a dataframe into multiple dataframes based on date, or a better approach to my problem?

Here is my code I've written so far
import pandas as pd
import numpy as np
import statsmodels.api as sm
from patsy import dmatrices

df = pd.read_csv('data.csv')
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df = df.set_index('date')

# Group dataframe on index by month and year 
# Groupby works, but dmatrices does not 
for df_group in df.groupby(pd.TimeGrouper("M")):
    y,X = dmatrices('value1 ~ value2 + value3', data=df_group,      
    return_type='dataframe')

 解决方案 
If you must loop, you need to unpack the key and the dataframe when you iterate over a groupby object:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from patsy import dmatrices

df = pd.read_csv('data.csv')
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df = df.set_index('date')
Note the use of group_name here:
for group_name, df_group in df.groupby(pd.TimeGrouper("M")):
    y,X = dmatrices('value1 ~ value2 + value3', data=df_group,      
    return_type='dataframe')
If you want to avoid iteration, do have a look at the notebook in Paul H's gist (see his comment), but a simple example of using apply would be:
def do_regression(df_group, ret='outcome'):
    """Apply the function to each group in the data and return one result."""
    y,X = dmatrices('value1 ~ value2 + value3',
                    data=df_group,      
                    return_type='dataframe')
    if ret == 'outcome':
        return y
    else:
        return X

outcome = df.groupby(pd.TimeGrouper("M")).apply(do_regression, ret='outcome')


                        
这篇关于 pandas  - 根据日期将数据分割成多个数据框？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

pandas - 根据日期将数据分割成多个数据框？ [英] Pandas - Split dataframe into multiple dataframes based on dates?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas - 根据日期将数据分割成多个数据框？ [英] Pandas - Split dataframe into multiple dataframes based on dates?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭