时间序列分析-间隔不均等的度量- pandas +统计模型 [英] Time Series Analysis - unevenly spaced measures - pandas + statsmodels

查看:136
本文介绍了时间序列分析-间隔不均等的度量- pandas +统计模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个numpy数组light_points和time_points,想对这些数据使用一些时间序列分析方法.

I have two numpy arrays light_points and time_points and would like to use some time series analysis methods on those data.

然后我尝试了这个:

import statsmodels.api as sm
import pandas as pd
tdf = pd.DataFrame({'time':time_points[:]})
rdf =  pd.DataFrame({'light':light_points[:]})
rdf.index = pd.DatetimeIndex(freq='w',start=0,periods=len(rdf.light))
#rdf.index = pd.DatetimeIndex(tdf['time'])

这有效,但没有做正确的事. 的确,测量值不是均匀地间隔开的,如果我只是将time_points pandas DataFrame声明为帧的索引,则会出现错误:

This works but is not doing the correct thing. Indeed, the measurements are not evenly time-spaced and if I just declare the time_points pandas DataFrame as the index of my frame, I get an error :

rdf.index = pd.DatetimeIndex(tdf['time'])

decomp = sm.tsa.seasonal_decompose(rdf)

elif freq is None:
raise ValueError("You must specify a freq or x must be a pandas object with a timeseries index")

ValueError: You must specify a freq or x must be a pandas object with a timeseries index

我不知道该如何纠正. 另外,似乎不建议使用熊猫的TimeSeries.

I don't know how to correct this. Also, it seems that pandas' TimeSeries are deprecated.

我尝试过这个:

rdf = pd.Series({'light':light_points[:]})
rdf.index = pd.DatetimeIndex(tdf['time'])

但是它使我长度不匹配:

But it gives me a length mismatch :

ValueError: Length mismatch: Expected axis has 1 elements, new values have 122 elements

不过,我不知道它的来源,如rdf ['light']和 tdf ['time']的长度相同...

Nevertheless, I don't understand where it comes from, as rdf['light'] and tdf['time'] are of same length...

最终,我尝试将rdf定义为熊猫系列":

Eventually, I tried by defining my rdf as a pandas Series :

rdf = pd.Series(light_points[:],index=pd.DatetimeIndex(time_points[:]))

我明白了:

ValueError: You must specify a freq or x must be a pandas object with a timeseries index

然后,我尝试改为用

 pd.TimeSeries(time_points[:])

它给我在Seasonal_decompose方法行上的错误:

And it gives me an error on the seasonal_decompose method line :

AttributeError: 'Float64Index' object has no attribute 'inferred_freq'

如何处理空间不均匀的数据? 我当时正在考虑通过在现有值之间添加许多未知值并使用插值法评估"这些点来创建一个间隔大致均匀的时间数组,但是我认为可以找到一种更干净,更轻松的解决方案.

How can I work with unevenly spaced data ? I was thinking about creating an approximately evenly spaced time array by adding many unknown values between the existing values and using interpolation to "evaluate" those points, but I think there could be a cleaner and easier solution.

推荐答案

seasonal_decompose()要求freq,该freq作为DateTimeIndex元信息的一部分提供,可以由pandas.Index.inferred_freq推断,也可以由pandas.Index.inferred_freq推断.用户作为integer给出每个周期的周期数.例如,每月12(从docstringseasonal_mean):

seasonal_decompose() requires a freq that is either provided as part of the DateTimeIndex meta information, can be inferred by pandas.Index.inferred_freq or else by the user as an integer that gives the number of periods per cycle. e.g., 12 for monthly (from docstring for seasonal_mean):

def seasonal_decompose(x, model="additive", filt=None, freq=None):
    """
    Parameters
    ----------
    x : array-like
        Time series
    model : str {"additive", "multiplicative"}
        Type of seasonal component. Abbreviations are accepted.
    filt : array-like
        The filter coefficients for filtering out the seasonal component.
        The default is a symmetric moving average.
    freq : int, optional
        Frequency of the series. Must be used if x is not a pandas
        object with a timeseries index.

为了说明-使用随机样本数据:

To illustrate - using random sample data:

length = 400
x = np.sin(np.arange(length)) * 10 + np.random.randn(length)
df = pd.DataFrame(data=x, index=pd.date_range(start=datetime(2015, 1, 1), periods=length, freq='w'), columns=['value'])

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 400 entries, 2015-01-04 to 2022-08-28
Freq: W-SUN

decomp = sm.tsa.seasonal_decompose(df)
data = pd.concat([df, decomp.trend, decomp.seasonal, decomp.resid], axis=1)
data.columns = ['series', 'trend', 'seasonal', 'resid']

Data columns (total 4 columns):
series      400 non-null float64
trend       348 non-null float64
seasonal    400 non-null float64
resid       348 non-null float64
dtypes: float64(4)
memory usage: 15.6 KB

到目前为止,效果很好-现在从DateTimeIndex中随机删除元素以创建空间不均匀的数据:

So far, so good - now randomly dropping elements from the DateTimeIndex to create unevenly space data:

df = df.iloc[np.unique(np.random.randint(low=0, high=length, size=length * .8))]

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 222 entries, 2015-01-11 to 2022-08-21
Data columns (total 1 columns):
value    222 non-null float64
dtypes: float64(1)
memory usage: 3.5 KB

df.index.freq

None

df.index.inferred_freq

None

在此数据运行"上运行seasonal_decomp:

Running the seasonal_decomp on this data 'works':

decomp = sm.tsa.seasonal_decompose(df, freq=52)

data = pd.concat([df, decomp.trend, decomp.seasonal, decomp.resid], axis=1)
data.columns = ['series', 'trend', 'seasonal', 'resid']

DatetimeIndex: 224 entries, 2015-01-04 to 2022-08-07
Data columns (total 4 columns):
series      224 non-null float64
trend       172 non-null float64
seasonal    224 non-null float64
resid       172 non-null float64
dtypes: float64(4)
memory usage: 8.8 KB

问题是-结果有多有用.即使没有数据缺口,也使季节模式的推断复杂化(请参见

The question is - how useful is the result. Even without gaps in the data that complicate inference of seasonal patterns (see example use of .interpolate() in the release notes, statsmodels qualifies this procedure as follows:

Notes
-----
This is a naive decomposition. More sophisticated methods should
be preferred.

The additive model is Y[t] = T[t] + S[t] + e[t]

The multiplicative model is Y[t] = T[t] * S[t] * e[t]

The seasonal component is first removed by applying a convolution
filter to the data. The average of this smoothed series for each
period is the returned seasonal component.

这篇关于时间序列分析-间隔不均等的度量- pandas +统计模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆