外推 pandas 数据框 [英] Extrapolate Pandas DataFrame
问题描述
使用Series.interpolate
可以很容易地在Pandas.DataFrame
中插入值,如何进行推断?
It is easy to interpolate values in a Pandas.DataFrame
using Series.interpolate
, how can extrapolation be done?
例如,给定一个所示的DataFrame,我们如何将其推断14个月至2014-12-31?线性外推很好.
For example, given a DataFrame as shown, how can we extrapolate it 14 more months to 2014-12-31? Linear extrapolation is fine.
X1 = range(10)
X2 = map(lambda x: x**2, X1)
df = pd.DataFrame({'x1': X1, 'x2': X2}, index=pd.date_range('20130101',periods=10,freq='M'))
我认为必须首先创建一个新的DataFrame,其DateTimeIndex从2013-11-31开始,并再扩展14个M
周期.除此之外,我还被困住了.
I am thinking that a new DataFrame must first be created, with the DateTimeIndex starting from 2013-11-31 and extending for 14 more M
periods. Beyond that I'm stuck.
推荐答案
通过DatetimeIndex
索引外推DataFrame
这可以通过两个步骤完成:
Extrapolating a DataFrame
with a DatetimeIndex
index
This can be done with two steps:
- 扩展
DatetimeIndex
- 推断数据
扩展索引
使用新的DataFrame
覆盖df
,其中数据为重新采样到新的扩展索引上.html"rel =" noreferrer>索引的开始,时间段和频率.就像csv
例子一样,这允许原始的df
来自任何地方.这样一来,列可以方便地充满NaN !
Extend the Index
Overwrite df
with a new DataFrame
where the data is resampled onto a new extended index based on original index's start, period and frequency. This allows the original df
to come from anywhere, as in the csv
example case. With this the columns get conveniently filled with NaNs!
# Fake DataFrame for example (could come from anywhere)
X1 = range(10)
X2 = map(lambda x: x**2, X1)
df = pd.DataFrame({'x1': X1, 'x2': X2}, index=pd.date_range('20130101',periods=10,freq='M'))
# Number of months to extend
extend = 5
# Extrapolate the index first based on original index
df = pd.DataFrame(
data=df,
index=pd.date_range(
start=df.index[0],
periods=len(df.index) + extend,
freq=df.index.freq
)
)
# Display
print df
x1 x2
2013-01-31 0 0
2013-02-28 1 1
2013-03-31 2 4
2013-04-30 3 9
2013-05-31 4 16
2013-06-30 5 25
2013-07-31 6 36
2013-08-31 7 49
2013-09-30 8 64
2013-10-31 9 81
2013-11-30 NaN NaN
2013-12-31 NaN NaN
2014-01-31 NaN NaN
2014-02-28 NaN NaN
2014-03-31 NaN NaN
外推数据
大多数推断器将要求输入数字而不是日期.这可以通过
Extrapolate the data
Most extrapolators will require the inputs to be numeric instead of dates. This can be done with
# Temporarily remove dates and make index numeric
di = df.index
df = df.reset_index().drop('index', 1)
有关如何推断每个值的信息,请参见答案 DataFrame
的列,其中包含 3 rd 阶多项式. /p>
See this answer for how to extrapolate the values of each column of a DataFrame
with a 3rd order polynomial.
来自 answer
# Curve fit each column
for col in fit_df.columns:
# Get x & y
x = fit_df.index.astype(float).values
y = fit_df[col].values
# Curve fit column and get curve parameters
params = curve_fit(func, x, y, guess)
# Store optimized parameters
col_params[col] = params[0]
# Extrapolate each column
for col in df.columns:
# Get the index values for NaNs in the column
x = df[pd.isnull(df[col])].index.astype(float).values
# Extrapolate those points with the fitted function
df[col][x] = func(x, *col_params[col])
一旦对列进行了推断,就将日期放回
Once the columns are extrapolated, put the dates back
# Put date index back
df.index = di
# Display
print df
x1 x2
2013-01-31 0 0
2013-02-28 1 1
2013-03-31 2 4
2013-04-30 3 9
2013-05-31 4 16
2013-06-30 5 25
2013-07-31 6 36
2013-08-31 7 49
2013-09-30 8 64
2013-10-31 9 81
2013-11-30 10 100
2013-12-31 11 121
2014-01-31 12 144
2014-02-28 13 169
2014-03-31 14 196
这篇关于外推 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!