对"pandas"系列进行重采样时,每天要保留24小时(从每天到每小时) [英] Keep 24h for each day when resampling `pandas` `Series` (from daily to hourly)

查看:75
本文介绍了对"pandas"系列进行重采样时,每天要保留24小时(从每天到每小时)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 pandas Series ,其中一个(tz本地化的) DateTimeIndex 每天有一个值:

I have a pandas Series with a (tz-localized) DateTimeIndex with one value per day:

tmpr
Out[38]: 
2018-01-01 00:00:00+01:00    1.810
2018-01-02 00:00:00+01:00    2.405
2018-01-03 00:00:00+01:00    1.495
2018-01-04 00:00:00+01:00    1.600
2018-01-05 00:00:00+01:00    0.545

2020-12-27 00:00:00+01:00    2.655
2020-12-28 00:00:00+01:00    1.705
2020-12-29 00:00:00+01:00    1.255
2020-12-30 00:00:00+01:00    1.405
2020-12-31 00:00:00+01:00    3.000
Freq: D, Name: tmpr, Length: 1096, dtype: float64

我想将其上采样为每小时值,以便每个值重复24次(或23或25次,具体取决于夏季/冬季转换,但这完全是另外一回事了).这是我尝试过的:

which I want to upsample to hourly values, so that each value is repeated 24 times (or 23 or 25, depending on summer/wintertime changeover, but that's a whole other story). Here's what I tried:

tmpr.resample('h').ffill()
Out[39]: 
2018-01-01 00:00:00+01:00    1.810
2018-01-01 01:00:00+01:00    1.810
2018-01-01 02:00:00+01:00    1.810
2018-01-01 03:00:00+01:00    1.810
2018-01-01 04:00:00+01:00    1.810

2020-12-30 20:00:00+01:00    1.405
2020-12-30 21:00:00+01:00    1.405
2020-12-30 22:00:00+01:00    1.405
2020-12-30 23:00:00+01:00    1.405
2020-12-31 00:00:00+01:00    3.000
Freq: H, Name: tmpr, Length: 26281, dtype: float64

问题是最后一天:我无法在 0:00 之后的23小时内包含 resample .

The problem is the final day: I can't get resample to include the 23 hours after 0:00.

添加 closed 参数没有任何区别,无论是重采样还是创建原始时间序列.

Adding a closed parameter doesn't make a difference, neither when resampling, nor when creating the original timeseries.

(我尝试用左或右闭合索引创建原始的 Series : pd.date_range(start = pd.Timestamp(2018,1,1),end= pd.Timestamp(2021,1,1),freq ='D',closed ='left') ... end = pd.Timestamp(2020,12,31),但生成的Series似乎相同.)

(I've tried creating the original Series with a left or a right-closed index: pd.date_range(start=pd.Timestamp(2018, 1, 1), end=pd.Timestamp(2021, 1, 1), freq='D', closed='left') and ... end=pd.Timestamp(2020, 12, 31), but the resulting Series seems the same.)

我总是可以在附加日期(2021-01-01)后面附加一个虚拟值,然后在最后将其删除,但这确实很麻烦.

I could always append an additinal day (2021-01-01) with a dummy value, and then remove it at the end, but that's terribly hacky.

关于如何按预期方式进行操作的任何想法?

Any ideas on how to do this the way it was intended?

PS-在先前的项目中,使用 PeriodIndex 而不是 DateTimeIndex ,我没有遇到任何问题.但是,我不能在这里使用它,因为它们不支持我确实需要的时区功能.

PS - In a previous project, using a PeriodIndex instead of a DateTimeIndex, I had no problems. However, I cannot use that here as those do not support time zone functionality, which I do need.

推荐答案

由于您的数据是每天的数据,因此您只需创建新的时间戳和 reindex :

Since your data is daily, you can do just create new timestamps and reindex:

new_timestamps = pd.date_range(tmpr.index[0], 
                          tmpr.index[-1]+pd.to_timedelta('23H'),
                          freq='H')

tmpr.reindex(new_timestamps).ffill()

输出(用于示例数据的前半部分):

Output (for the first half of your sample data):

2018-01-01 00:00:00+01:00    1.810
2018-01-01 01:00:00+01:00    1.810
2018-01-01 02:00:00+01:00    1.810
2018-01-01 03:00:00+01:00    1.810
2018-01-01 04:00:00+01:00    1.810
                             ...  
2018-01-05 19:00:00+01:00    0.545
2018-01-05 20:00:00+01:00    0.545
2018-01-05 21:00:00+01:00    0.545
2018-01-05 22:00:00+01:00    0.545
2018-01-05 23:00:00+01:00    0.545
Freq: H, Name: tmpr, Length: 120, dtype: float64

这篇关于对"pandas"系列进行重采样时,每天要保留24小时(从每天到每小时)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆