python中有简单的方法可以将数据点外推到未来吗? [英] Is there easy way in python to extrapolate data points to the future?
问题描述
我有一个简单的numpy数组,每个日期都有一个数据点.像这样:
I have a simple numpy array, for every date there is a data point. Something like this:
>>> import numpy as np
>>> from datetime import date
>>> from datetime import date
>>> x = np.array( [(date(2008,3,5), 4800 ), (date(2008,3,15), 4000 ), (date(2008,3,
20), 3500 ), (date(2008,4,5), 3000 ) ] )
是否有简单的方法可以将数据点外推到将来:date(2008,5,1),date(2008、5、20)等?我了解可以使用数学算法来完成.但是在这里我正在寻找一些低落的果实.实际上,我喜欢numpy.linalg.solve所做的事情,但它似乎不适用于推断.也许我绝对是错的.
Is there easy way to extrapolate data points to the future: date(2008,5,1), date(2008, 5, 20) etc? I understand it can be done with mathematical algorithms. But here I am seeking for some low hanging fruit. Actually I like what numpy.linalg.solve does, but it does not look applicable for the extrapolation. Maybe I am absolutely wrong.
实际上,更具体地说,我正在构建燃尽图(xp术语):"x =日期和y =要完成的工作量",所以我得到了已经完成的冲刺,我想直观地了解一下如果目前的状况持续下去,未来的冲刺将继续.最后,我想预测发布日期.因此,要完成的工作量"的性质总是在消耗图表上下降.我还想获得推断的发布日期:音量变为零时的日期.
Actually to be more specific I am building a burn-down chart (xp term): 'x=date and y=volume of work to be done', so I have got the already done sprints and I want to visualise how the future sprints will go if the current situation persists. And finally I want to predict the release date. So the nature of 'volume of work to be done' is it always goes down on burn-down charts. Also I want to get the extrapolated release date: date when the volume becomes zero.
这一切都是为了向开发团队展示情况.精确度在这里不是很重要:)开发团队的动力是主要因素.这意味着我对非常近似的外推技术绝对满意.
This is all for showing to dev team how things go. The preciseness is not so important here :) The motivation of dev team is the main factor. That means I am absolutely fine with the very approximate extrapolation technique.
推荐答案
外推法很容易产生垃圾.试试这个. 当然可以进行许多不同的推论. 有些会产生明显的垃圾,有些会产生非明显的垃圾,其中许多是不确定的.
It's all too easy for extrapolation to generate garbage; try this. Many different extrapolations are of course possible; some produce obvious garbage, some non-obvious garbage, many are ill-defined.
""" extrapolate y,m,d data with scipy UnivariateSpline """
import numpy as np
from scipy.interpolate import UnivariateSpline
# pydoc scipy.interpolate.UnivariateSpline -- fitpack, unclear
from datetime import date
from pylab import * # ipython -pylab
__version__ = "denis 23oct"
def daynumber( y,m,d ):
""" 2005,1,1 -> 0 2006,1,1 -> 365 ... """
return date( y,m,d ).toordinal() - date( 2005,1,1 ).toordinal()
days, values = np.array([
(daynumber(2005,1,1), 1.2 ),
(daynumber(2005,4,1), 1.8 ),
(daynumber(2005,9,1), 5.3 ),
(daynumber(2005,10,1), 5.3 )
]).T
dayswanted = np.array([ daynumber( year, month, 1 )
for year in range( 2005, 2006+1 )
for month in range( 1, 12+1 )])
np.set_printoptions( 1 ) # .1f
print "days:", days
print "values:", values
print "dayswanted:", dayswanted
title( "extrapolation with scipy.interpolate.UnivariateSpline" )
plot( days, values, "o" )
for k in (1,2,3): # line parabola cubicspline
extrapolator = UnivariateSpline( days, values, k=k )
y = extrapolator( dayswanted )
label = "k=%d" % k
print label, y
plot( dayswanted, y, label=label ) # pylab
legend( loc="lower left" )
grid(True)
savefig( "extrapolate-UnivariateSpline.png", dpi=50 )
show()
添加了一个 Scipy票证, "FITPACK类的行为在 scipy.interpolate比文档让人相信的复杂得多" 恕我直言,也适用于其他软件文档.
Added: a Scipy ticket says, "The behavior of the FITPACK classes in scipy.interpolate is much more complex than the docs would lead one to believe" -- imho true of other software doc too.
这篇关于python中有简单的方法可以将数据点外推到未来吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!