具有特定日期的 pandas 数据框重采样 [英] pandas Dataframe resampling with specific dates

查看:62
本文介绍了具有特定日期的 pandas 数据框重采样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于pandas Dataframes 的重采样方法的问题.我有一个每天观察一次的 DataFrame:

I have a question regarding the resampling method of pandas Dataframes. I have a DataFrame with one observation per day:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0,100,size=(366, 1)), columns=list('A'))
df.index = pd.date_range(datetime.date(2016,1,1),datetime.date(2016,12,31))

如果我想计算每个月的总和(或其他),我可以直接做:

if I want to compute the sum (or other) for every month, I can directly do:

EOM_sum = df.resample(rule="M").sum()

但是我有一个特定的日历(不规则频率):

however I have a specific calendar (irregular frequency):

import datetime
custom_dates = pd.DatetimeIndex([datetime.date(2016,1,13),
                             datetime.date(2016,2,8),
                             datetime.date(2016,3,16),
                             datetime.date(2016,4,10),
                             datetime.date(2016,5,13),
                             datetime.date(2016,6,17),
                             datetime.date(2016,7,12),
                             datetime.date(2016,8,11),
                             datetime.date(2016,9,10),
                             datetime.date(2016,10,9),
                             datetime.date(2016,11,14),
                             datetime.date(2016,12,19),
                             datetime.date(2016,12,31)])

如果我想计算每个时期的总和,我目前在 df 中添加一个临时列,每行所属的时期结束,然后用 groupby 执行操作:

If I want to compute the sum for each period, I currently add a temporary column to df with the end of the period each row belongs to, and then perform the operation with a groupby:

df["period"] = custom_dates[custom_dates.searchsorted(df.index)]
custom_sum = df.groupby(by=['period']).sum()

然而,这很脏,看起来不像pythonic.在 Pandas 中是否有内置方法可以做到这一点?提前致谢.

However this is quite dirty and doesn't look pythonic. Is there a built-in method to do this in Pandas? Thanks in advance.

推荐答案

创建 nw 列不是必须的,你可以通过DatatimeIndexgroupby,因为lengthdflenght 相同:

Creating nw column is not necessary, you can groupby by DatatimeIndex, because length is same as lenght of df:

import pandas as pd
import numpy as np

np.random.seed(100)
df = pd.DataFrame(np.random.randint(0,100,size=(366, 1)), columns=list('A'))
df.index = pd.date_range(datetime.date(2016,1,1),datetime.date(2016,12,31))
print (df.head())
             A
2016-01-01   8
2016-01-02  24
2016-01-03  67
2016-01-04  87
2016-01-05  79

import datetime
custom_dates = pd.DatetimeIndex([datetime.date(2016,1,13),
                             datetime.date(2016,2,8),
                             datetime.date(2016,3,16),
                             datetime.date(2016,4,10),
                             datetime.date(2016,5,13),
                             datetime.date(2016,6,17),
                             datetime.date(2016,7,12),
                             datetime.date(2016,8,11),
                             datetime.date(2016,9,10),
                             datetime.date(2016,10,9),
                             datetime.date(2016,11,14),
                             datetime.date(2016,12,19),
                             datetime.date(2016,12,31)])

custom_sum = df.groupby(custom_dates[custom_dates.searchsorted(df.index)]).sum()
print (custom_sum)
               A
2016-01-13   784
2016-02-08  1020
2016-03-16  1893
2016-04-10  1242
2016-05-13  1491
2016-06-17  1851
2016-07-12  1319
2016-08-11  1348
2016-09-10  1616
2016-10-09  1523
2016-11-14  1793
2016-12-19  1547
2016-12-31   664

另一种解决方案是通过 custom_dates 附加新的 indexgroupby 使用 numpy array 作为 的输出搜索排序函数:

Another solution is append new index by custom_dates, groupby use numpy array as output from searchsorted function:

print (custom_dates.searchsorted(df.index))
[ 0  0  0  0  0  0  0  0  0  0  0  0  0  1  1  1  1  1  1  1  1  1  1  1  1
  1  1  1  1  1  1  1  1  1  1  1  1  1  1  2  2  2  2  2  2  2  2  2  2  2
  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
  2  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3
  3  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4
  4  4  4  4  4  4  4  4  4  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  6  6  6  6  6  6
  6  6  6  6  6  6  6  6  6  6  6  6  6  6  6  6  6  6  6  7  7  7  7  7  7
  7  7  7  7  7  7  7  7  7  7  7  7  7  7  7  7  7  7  7  7  7  7  7  7  8
  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8
  8  8  8  8  9  9  9  9  9  9  9  9  9  9  9  9  9  9  9  9  9  9  9  9  9
  9  9  9  9  9  9  9  9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 11 11 11 11 11 11
 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
 11 11 11 11 12 12 12 12 12 12 12 12 12 12 12 12]

custom_sum = df.groupby(custom_dates.searchsorted(df.index)).sum()
custom_sum.index = custom_dates
print (custom_sum)
               A
2016-01-13   784
2016-02-08  1020
2016-03-16  1893
2016-04-10  1242
2016-05-13  1491
2016-06-17  1851
2016-07-12  1319
2016-08-11  1348
2016-09-10  1616
2016-10-09  1523
2016-11-14  1793
2016-12-19  1547
2016-12-31   664

这篇关于具有特定日期的 pandas 数据框重采样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆