重新采样/填充日期时间戳块的空白 [英] Resample/fill gaps for blocks of datetime stamps
问题描述
问题
我将一个csv放到存在一些日期时间间隔的数据帧中-采样频率为15分钟,对于每个日期时间戳,总是有一个由三个值组成的块.在此示例中,缺少日期时间2017-12-11 23:15:00
的块.
I put a csv to a dataframe where some datetime gaps are present - sample frequency is 15 min, for each datetimestamps there is always a block of three values. In this example the block for the datetime 2017-12-11 23:15:00
is missing.
ID Datetime Value
0 a 2017-12-11 23:00:00 20.0
1 b 2017-12-11 23:00:00 20.9
2 c 2017-12-11 23:00:00 21.0
3 a 2017-12-11 23:30:00 19.8
4 b 2017-12-11 23:30:00 20.8
5 c 2017-12-11 23:30:00 20.8
所需结果
我想做的是重新采样Datetime并用零填充Value
的空白:
What I want to do is to resample the Datetime and fill the gaps for Value
with zeros:
ID Datetime Value
0 a 2017-12-11 23:00:00 20.0
1 b 2017-12-11 23:00:00 20.9
2 c 2017-12-11 23:00:00 21.0
3 a 2017-12-11 23:15:00 0.0
4 b 2017-12-11 23:15:00 0.0
5 c 2017-12-11 23:15:00 0.0
6 a 2017-12-11 23:30:00 19.8
7 b 2017-12-11 23:30:00 20.8
8 c 2017-12-11 23:30:00 20.8
我的问题
是否可以使用resample()
完成此操作,或者将其与groupby()
组合使用是否可以解决?
Is it possible to accomplish this with resample()
or is a solution possible with a combination with groupby()
?
import pandas as pd
df = pd.concat((pd.read_csv(file, parse_dates=[1], dayfirst=True,
names=headers)for file in all_files))
df.set_index("Datetime").resample('15min').fillna(0).reset_index()
推荐答案
您可以使用重采样,如果单个时间戳中有多个值,可以使用last/average.
You can use resample, and last / average if there are any multiple values for a single timestamp.
df.groupby('ID').resample('15min').last().fillna(0)
这将重新采样数据帧,并为每个采样周期取最后一个值(大多数情况下应为1或0个值),对于没有值但有索引(时间)的情况,它将插入一个0,而不是不适用.
This will resample the dataframe, and take the last value for each of the sample periods (should be 1 or 0 values mostly), and for the occasions where there are no values, but an index (time) it will insert a 0 instead of a Not Applicable.
注意,这只有在您具有适当的索引类型的情况下才有效,我看到您正在解析日期,调用df.dtypes将使您确定Datetime列具有有效的类型.如果计划基于时间进行许多操作,我建议将索引设置为"Datetime",并将其大部分保留在该位置. (即在上述命令之前执行此操作!)
Note, this will only work if you have the appropriate Index type, I see you are parsing dates, calling df.dtypes will allow you to make certain that you have valid types for the Datetime column. I would recommend setting the index to 'Datetime' and leaving it there mostly if planning on doing many/any operations based on times. (i.e, do this before the above command!)
df.set_index('Datetime', inplace=True)
这将导致下面出现新的MultiIndex DataFrame
This will result in the new MultiIndex DataFrame below
Out[76]:
ID Value
ID Datetime
a 2018-02-26 23:00:00 a 20.0
2018-02-26 23:15:00 0 0.0
2018-02-26 23:30:00 a 19.8
b 2018-02-26 23:00:00 b 20.9
2018-02-26 23:15:00 0 0.0
2018-02-26 23:30:00 b 20.8
c 2018-02-26 23:00:00 c 21.0
2018-02-26 23:15:00 0 0.0
2018-02-26 23:30:00 c 20.8
如果您只关注Value系列,则只需稍作移动和摇晃,我们便可以得到只有一个索引的稍微不同的数据帧.这样的好处是ID列中没有奇数(请参见上面的0)
And if you're only after the Value series, with a bit more moving and shaking we can end up with a slightly different dataframe with only a single index. This has the benefit of not having odd values in the ID column (see 0 above)
(df.groupby('ID')['Value']
.resample('15min')
.last()
.fillna(0)
.reset_index()
.set_index('Datetime')
.sort_index())
Out[107]:
ID Value
Datetime
2018-02-26 23:00:00 a 20.0
2018-02-26 23:00:00 b 20.9
2018-02-26 23:00:00 c 21.0
2018-02-26 23:15:00 a 0.0
2018-02-26 23:15:00 b 0.0
2018-02-26 23:15:00 c 0.0
2018-02-26 23:30:00 a 19.8
2018-02-26 23:30:00 b 20.8
2018-02-26 23:30:00 c 20.8
这篇关于重新采样/填充日期时间戳块的空白的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!