重新采样/填充日期时间戳块的空白 [英] Resample/fill gaps for blocks of datetime stamps

查看:106
本文介绍了重新采样/填充日期时间戳块的空白的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题

我将一个csv放到存在一些日期时间间隔的数据帧中-采样频率为15分钟,对于每个日期时间戳,总是有一个由三个值组成的块.在此示例中,缺少日期时间2017-12-11 23:15:00的块.

I put a csv to a dataframe where some datetime gaps are present - sample frequency is 15 min, for each datetimestamps there is always a block of three values. In this example the block for the datetime 2017-12-11 23:15:00 is missing.

         ID           Datetime   Value
0        a 2017-12-11 23:00:00   20.0
1        b 2017-12-11 23:00:00   20.9
2        c 2017-12-11 23:00:00   21.0
3        a 2017-12-11 23:30:00   19.8
4        b 2017-12-11 23:30:00   20.8
5        c 2017-12-11 23:30:00   20.8

所需结果

我想做的是重新采样Datetime并用零填充Value的空白:

What I want to do is to resample the Datetime and fill the gaps for Value with zeros:

         ID           Datetime   Value
0        a 2017-12-11 23:00:00   20.0
1        b 2017-12-11 23:00:00   20.9
2        c 2017-12-11 23:00:00   21.0
3        a 2017-12-11 23:15:00   0.0
4        b 2017-12-11 23:15:00   0.0
5        c 2017-12-11 23:15:00   0.0
6        a 2017-12-11 23:30:00   19.8
7        b 2017-12-11 23:30:00   20.8
8        c 2017-12-11 23:30:00   20.8

我的问题

是否可以使用resample()完成此操作,或者将其与groupby()组合使用是否可以解决?

Is it possible to accomplish this with resample() or is a solution possible with a combination with groupby()?

import pandas as pd

df = pd.concat((pd.read_csv(file, parse_dates=[1], dayfirst=True, 
                    names=headers)for file in all_files))
df.set_index("Datetime").resample('15min').fillna(0).reset_index()

推荐答案

您可以使用重采样,如果单个时间戳中有多个值,可以使用last/average.

You can use resample, and last / average if there are any multiple values for a single timestamp.

df.groupby('ID').resample('15min').last().fillna(0)

这将重新采样数据帧,并为每个采样周期取最后一个值(大多数情况下应为1或0个值),对于没有值但有索引(时间)的情况,它将插入一个0,而不是不适用.

This will resample the dataframe, and take the last value for each of the sample periods (should be 1 or 0 values mostly), and for the occasions where there are no values, but an index (time) it will insert a 0 instead of a Not Applicable.

注意,这只有在您具有适当的索引类型的情况下才有效,我看到您正在解析日期,调用df.dtypes将使您确定Datetime列具有有效的类型.如果计划基于时间进行许多操作,我建议将索引设置为"Datetime",并将其大部分保留在该位置. (即在上述命令之前执行此操作!)

Note, this will only work if you have the appropriate Index type, I see you are parsing dates, calling df.dtypes will allow you to make certain that you have valid types for the Datetime column. I would recommend setting the index to 'Datetime' and leaving it there mostly if planning on doing many/any operations based on times. (i.e, do this before the above command!)

df.set_index('Datetime', inplace=True)

这将导致下面出现新的MultiIndex DataFrame

This will result in the new MultiIndex DataFrame below

Out[76]: 
                       ID  Value
ID Datetime                     
a  2018-02-26 23:00:00  a   20.0
   2018-02-26 23:15:00  0    0.0
   2018-02-26 23:30:00  a   19.8
b  2018-02-26 23:00:00  b   20.9
   2018-02-26 23:15:00  0    0.0
   2018-02-26 23:30:00  b   20.8
c  2018-02-26 23:00:00  c   21.0
   2018-02-26 23:15:00  0    0.0
   2018-02-26 23:30:00  c   20.8

如果您只关注Value系列,则只需稍作移动和摇晃,我们便可以得到只有一个索引的稍微不同的数据帧.这样的好处是ID列中没有奇数(请参见上面的0)

And if you're only after the Value series, with a bit more moving and shaking we can end up with a slightly different dataframe with only a single index. This has the benefit of not having odd values in the ID column (see 0 above)

(df.groupby('ID')['Value']
 .resample('15min')
 .last()
 .fillna(0)
 .reset_index()
 .set_index('Datetime')
 .sort_index())

Out[107]: 
                    ID  Value
Datetime                     
2018-02-26 23:00:00  a   20.0
2018-02-26 23:00:00  b   20.9
2018-02-26 23:00:00  c   21.0
2018-02-26 23:15:00  a    0.0
2018-02-26 23:15:00  b    0.0
2018-02-26 23:15:00  c    0.0
2018-02-26 23:30:00  a   19.8
2018-02-26 23:30:00  b   20.8
2018-02-26 23:30:00  c   20.8

这篇关于重新采样/填充日期时间戳块的空白的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆