pandas 使用fill_method进行的重采样:是否需要知道从哪一行复制了数据? [英] pandas's resample with fill_method: Need to know data from which row was copied?
问题描述
我正在尝试使用重采样方法来填补时间序列数据中的空白.但是我也想知道哪一行用于填充丢失的数据.
I am trying to use resample method to fill the gaps in timeseries data. But I also want to know which row was used to fill the missed data.
这是我的输入系列.
In [28]: data
Out[28]:
Date
2002-09-09 233.25
2002-09-11 233.05
2002-09-16 230.25
2002-09-18 230.10
2002-09-19 230.05
Name: Price
通过重新采样,我会得到这个
With resample, I will get this
In [29]: data.resample("D", fill_method='bfill')
Out[29]:
Date
2002-09-09 233.25
2002-09-10 233.05
2002-09-11 233.05
2002-09-12 230.25
2002-09-13 230.25
2002-09-14 230.25
2002-09-15 230.25
2002-09-16 230.25
2002-09-17 230.10
2002-09-18 230.10
2002-09-19 230.05
Freq: D
我正在寻找
Out[29]:
Date
2002-09-09 233.25 2002-09-09
2002-09-10 233.05 2012-09-11
2002-09-11 233.05 2012-09-11
2002-09-12 230.25 2012-09-16
2002-09-13 230.25 2012-09-16
2002-09-14 230.25 2012-09-16
2002-09-15 230.25 2012-09-16
2002-09-16 230.25 2012-09-16
2002-09-17 230.10 2012-09-18
2002-09-18 230.10 2012-09-18
2002-09-19 230.05 2012-09-19
有帮助吗?
推荐答案
将Series
转换为DataFrame
后,将索引复制到其自己的列中. (DatetimeIndex.format()
在这里很有用,因为它返回索引的字符串表示形式,而不是Timestamp/datetime对象.)
After converting the Series
to a DataFrame
, copy the index into it's own column. (DatetimeIndex.format()
is useful here as it returns a string representation of the index, rather than Timestamp/datetime objects.)
In [510]: df = pd.DataFrame(data)
In [511]: df['OrigDate'] = df.index.format()
In [513]: df
Out[513]:
Price OrigDate
Date
2002-09-09 233.25 2002-09-09
2002-09-11 233.05 2002-09-11
2002-09-16 230.25 2002-09-16
2002-09-18 230.10 2002-09-18
2002-09-19 230.05 2002-09-19
要进行重新采样而不进行聚合,可以使用辅助方法asfreq()
.
For resampling without aggregation, there is a helper method asfreq()
.
In [528]: df.asfreq("D", method='bfill')
Out[528]:
Price OrigDate
2002-09-09 233.25 2002-09-09
2002-09-10 233.05 2002-09-11
2002-09-11 233.05 2002-09-11
2002-09-12 230.25 2002-09-16
2002-09-13 230.25 2002-09-16
2002-09-14 230.25 2002-09-16
2002-09-15 230.25 2002-09-16
2002-09-16 230.25 2002-09-16
2002-09-17 230.10 2002-09-18
2002-09-18 230.10 2002-09-18
2002-09-19 230.05 2002-09-19
这实际上是以下内容的简称,其中last()
是在中间DataFrameGroupBy
对象上调用的.
This is effectively short-hand for the following, where last()
is invoked on the intermediate DataFrameGroupBy
objects.
In [529]: df.resample("D", how='last', fill_method='bfill')
Out[529]:
Price OrigDate
Date
2002-09-09 233.25 2002-09-09
2002-09-10 233.05 2002-09-11
2002-09-11 233.05 2002-09-11
2002-09-12 230.25 2002-09-16
2002-09-13 230.25 2002-09-16
2002-09-14 230.25 2002-09-16
2002-09-15 230.25 2002-09-16
2002-09-16 230.25 2002-09-16
2002-09-17 230.10 2002-09-18
2002-09-18 230.10 2002-09-18
2002-09-19 230.05 2002-09-19
这篇关于 pandas 使用fill_method进行的重采样:是否需要知道从哪一行复制了数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!