外推 pandas 数据框 [英] Extrapolate Pandas DataFrame

查看:117
本文介绍了外推 pandas 数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Series.interpolate可以很容易地在Pandas.DataFrame中插入值,如何进行推断?

It is easy to interpolate values in a Pandas.DataFrame using Series.interpolate, how can extrapolation be done?

例如,给定一个所示的DataFrame,我们如何将其推断14个月至2014-12-31?线性外推很好.

For example, given a DataFrame as shown, how can we extrapolate it 14 more months to 2014-12-31? Linear extrapolation is fine.

X1 = range(10)
X2 = map(lambda x: x**2, X1)
df = pd.DataFrame({'x1': X1, 'x2': X2},  index=pd.date_range('20130101',periods=10,freq='M'))

我认为必须首先创建一个新的DataFrame,其DateTimeIndex从2013-11-31开始,并再扩展14个M周期.除此之外,我还被困住了.

I am thinking that a new DataFrame must first be created, with the DateTimeIndex starting from 2013-11-31 and extending for 14 more M periods. Beyond that I'm stuck.

推荐答案

通过DatetimeIndex索引外推DataFrame

这可以通过两个步骤完成:

Extrapolating a DataFrame with a DatetimeIndex index

This can be done with two steps:

  1. 扩展 DatetimeIndex
  2. 推断数据

扩展索引

使用新的DataFrame覆盖df,其中数据为重新采样到新的扩展索引上.html"rel =" noreferrer>索引的开始,时间段和频率.就像csv例子一样,这允许原始的df来自任何地方.这样一来,列可以方便地充满NaN

Extend the Index

Overwrite df with a new DataFrame where the data is resampled onto a new extended index based on original index's start, period and frequency. This allows the original df to come from anywhere, as in the csv example case. With this the columns get conveniently filled with NaNs!

# Fake DataFrame for example (could come from anywhere)
X1 = range(10)
X2 = map(lambda x: x**2, X1)
df = pd.DataFrame({'x1': X1, 'x2': X2},  index=pd.date_range('20130101',periods=10,freq='M'))

# Number of months to extend
extend = 5

# Extrapolate the index first based on original index
df = pd.DataFrame(
    data=df,
    index=pd.date_range(
        start=df.index[0],
        periods=len(df.index) + extend,
        freq=df.index.freq
    )
)

# Display
print df


    x1  x2
2013-01-31   0   0
2013-02-28   1   1
2013-03-31   2   4
2013-04-30   3   9
2013-05-31   4  16
2013-06-30   5  25
2013-07-31   6  36
2013-08-31   7  49
2013-09-30   8  64
2013-10-31   9  81
2013-11-30 NaN NaN
2013-12-31 NaN NaN
2014-01-31 NaN NaN
2014-02-28 NaN NaN
2014-03-31 NaN NaN

外推数据

大多数推断器将要求输入数字而不是日期.这可以通过

Extrapolate the data

Most extrapolators will require the inputs to be numeric instead of dates. This can be done with

# Temporarily remove dates and make index numeric
di = df.index
df = df.reset_index().drop('index', 1)

有关如何推断每个值的信息,请参见答案 DataFrame的列,其中包含 3 rd 阶多项式. /p>

See this answer for how to extrapolate the values of each column of a DataFrame with a 3rd order polynomial.

来自 answer

# Curve fit each column
for col in fit_df.columns:
    # Get x & y
    x = fit_df.index.astype(float).values
    y = fit_df[col].values
    # Curve fit column and get curve parameters
    params = curve_fit(func, x, y, guess)
    # Store optimized parameters
    col_params[col] = params[0]

# Extrapolate each column
for col in df.columns:
    # Get the index values for NaNs in the column
    x = df[pd.isnull(df[col])].index.astype(float).values
    # Extrapolate those points with the fitted function
    df[col][x] = func(x, *col_params[col])

一旦对列进行了推断,就将日期放回

Once the columns are extrapolated, put the dates back

# Put date index back
df.index = di

# Display
print df


x1   x2
2013-01-31   0    0
2013-02-28   1    1
2013-03-31   2    4
2013-04-30   3    9
2013-05-31   4   16
2013-06-30   5   25
2013-07-31   6   36
2013-08-31   7   49
2013-09-30   8   64
2013-10-31   9   81
2013-11-30  10  100
2013-12-31  11  121
2014-01-31  12  144
2014-02-28  13  169
2014-03-31  14  196

这篇关于外推 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆