pandas - 根据日期将数据分割成多个数据框? [英] Pandas - Split dataframe into multiple dataframes based on dates?
问题描述
我将datetime列设置为索引,并希望对每个月的数据帧。
我认为这样做的方法是将数据帧分为几个数据帧,基于月份,存储到数据框列表中,然后在每个数据帧中执行回归列表。
我已经使用了groupby,它按月成功地分割数据帧,但不确定如何将groupby对象中的每个组正确转换为数据帧,以便能够运行我的回归
有谁知道如何根据日期将数据框分割成多个数据框,或者更好地解决我的问题?
这是我迄今为止编写的代码
导入熊猫为pd
导入numpy as np
import statsmodels.api as sm
from patsy import dmatrices
df = pd.read_csv('data.csv')
df ['date' ] = pd.to_datetime(df ['date'],format ='%Y%m%d')
df = df.set_index('date')
#组数据帧按月份和年份索引
#Groupby工作,但df.groupby(pd.TimeGrouper(M))中df_group的dmatrices不
:
y,X = dmatrices('value1〜 value2 + value3',data = df_group,
return_type ='dataframe')
如果您必须循环,则当您迭代 groupby $时,您需要解压缩密钥和数据帧c $ c> object:
将pandas导入pd
导入numpy作为np
import statsmodels.api作为sm
从patsy导入dmatrices
df = pd.read_csv('data.csv')
df ['date'] = pd.to_datetime(df ['date' ],format ='%Y%m%d')
df = df.set_index('date')
请注意使用 group_name
这里:
对于group_name,df.groupby中的df_group(pd.TimeGrouper(M)):
y,X = dmatrices('value1〜value2 + value3',data = df_group,
return_type ='dataframe')
如果你想避免迭代,请仔细阅读 Paul H的主旨(见他的评论),但一个简单的考试使用申请
将是:
def do_regression(df_group,ret ='result'):
将该函数应用于数据中的每个组并返回一个结果。
y,X = dmatrices('value1〜value2 + value3',
data = df_group,
return_type ='dataframe')
if ret =='outcome':
return y
else:
return X
outcome = df.groupby(pd.TimeGrouper(M))apply(do_regression,ret ='outcome')
I have a dataframe with multiple columns along with a date column. The date format is 12/31/15 and I have set it as a datetime object.
I set the datetime column as the index and want to perform a regression calculation for each month of the dataframe.
I believe the methodology to do this would be to split the dataframe into multiple dataframes based on month, store into a list of dataframes, then perform regression on each dataframe in the list.
I have used groupby which successfully split the dataframe by month, but am unsure how to correctly convert each group in the groupby object into a dataframe to be able to run my regression function on it.
Does anyone know how to split a dataframe into multiple dataframes based on date, or a better approach to my problem?
Here is my code I've written so far
import pandas as pd
import numpy as np
import statsmodels.api as sm
from patsy import dmatrices
df = pd.read_csv('data.csv')
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df = df.set_index('date')
# Group dataframe on index by month and year
# Groupby works, but dmatrices does not
for df_group in df.groupby(pd.TimeGrouper("M")):
y,X = dmatrices('value1 ~ value2 + value3', data=df_group,
return_type='dataframe')
If you must loop, you need to unpack the key and the dataframe when you iterate over a groupby
object:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from patsy import dmatrices
df = pd.read_csv('data.csv')
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df = df.set_index('date')
Note the use of group_name
here:
for group_name, df_group in df.groupby(pd.TimeGrouper("M")):
y,X = dmatrices('value1 ~ value2 + value3', data=df_group,
return_type='dataframe')
If you want to avoid iteration, do have a look at the notebook in Paul H's gist (see his comment), but a simple example of using apply
would be:
def do_regression(df_group, ret='outcome'):
"""Apply the function to each group in the data and return one result."""
y,X = dmatrices('value1 ~ value2 + value3',
data=df_group,
return_type='dataframe')
if ret == 'outcome':
return y
else:
return X
outcome = df.groupby(pd.TimeGrouper("M")).apply(do_regression, ret='outcome')
这篇关于 pandas - 根据日期将数据分割成多个数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!