如何通过对数据趋势进行线性回归来找出斜率值? [英] How to find out the slope value by applying linear regression on trend of a data?

查看:222
本文介绍了如何通过对数据趋势进行线性回归来找出斜率值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个时间序列数据,可以从中找到trend.现在,我需要绘制一条最适合趋势数据的回归线,并想知道斜率是+ ve还是+. -ve或constant.Below是我的csv文件,其中包含数据

I have a time series data from which I am able to find out the trend.Now I need to put a regression line which fits the best for the trend data and would like the know whether the slope is +ve or -ve or constant.Below is my csv file which contains the data

 date,cpu
2018-02-10 11:52:59.342269+00:00,6.0
2018-02-10 11:53:04.006971+00:00,6.0
2018-02-10 22:35:33.438948+00:00,4.0
2018-02-10 22:35:37.905242+00:00,4.0
2018-02-11 12:01:00.663084+00:00,4.0
2018-02-11 12:01:05.136107+00:00,4.0
2018-02-11 12:31:00.228447+00:00,5.0
2018-02-11 12:31:04.689054+00:00,5.0
2018-02-11 13:01:00.362877+00:00,5.0
2018-02-11 13:01:04.824231+00:00,5.0
2018-02-11 23:42:40.304334+00:00,0.0
2018-02-11 23:44:27.357619+00:00,0.0
2018-02-12 01:38:25.012175+00:00,7.0
2018-02-12 01:53:39.721800+00:00,8.0
2018-02-12 01:53:53.310947+00:00,8.0
2018-02-12 01:56:37.657977+00:00,8.0
2018-02-12 01:56:45.133701+00:00,8.0
2018-02-12 04:49:36.028754+00:00,9.0
2018-02-12 04:49:40.097157+00:00,9.0
2018-02-12 07:20:52.148437+00:00,9.0
...          ...                 ...

首先我需要在给定数据中找出trend.下面是找出trend

First I need to find out the trend in the given data.Below is the code which finds out the trend

df = pd.read_csv("test_forecast/cpu_data.csv")
df["date"] = pd.to_datetime(df["date"], format="%Y-%m-%d")
df.set_index("date", inplace=True)
df = df.resample('D').mean().interpolate(method='linear', axis=0).fillna(0)

X = df.index.strftime('%Y-%m-%d')
Y = sm.tsa.seasonal_decompose(df["cpu"]).trend.interpolate(method='linear', axis=0).fillna(0).values

所以X是每天的日期,而Y是每天的趋势数据.现在,我想应用线性回归来找到回归线,并找出斜率是+ ve还是-ve或常数.我已经尝试过下面的代码

So X is the daily dates and Y is the trend data for each day.Now I want to apply linear regression to find the regression line and find out whether the slope is +ve or -ve or constant.I have tried the code below

model = sm.OLS(y,X, missing='drop')
results = model.fit()
print(results)

我希望结果变量具有有关因变量或自变量,斜率或截距的一些值.但是我得到以下错误

I am hoping the results variable will have some values regarding the dependent or independent variable, slopes or intercepts.But I get the below error

Traceback (most recent call last):
  File "/home/souvik/PycharmProjects/Pandas/test11.py", line 37, in <module>
    model = sm.OLS(y,X, missing='drop')
  File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/regression/linear_model.py", line 817, in __init__
    hasconst=hasconst, **kwargs)
  File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/regression/linear_model.py", line 663, in __init__
    weights=weights, hasconst=hasconst, **kwargs)
  File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/regression/linear_model.py", line 179, in __init__
    super(RegressionModel, self).__init__(endog, exog, **kwargs)
  File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/base/model.py", line 212, in __init__
    super(LikelihoodModel, self).__init__(endog, exog, **kwargs)
  File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/base/model.py", line 64, in __init__
    **kwargs)
  File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/base/model.py", line 87, in _handle_data
    data = handle_data(endog, exog, missing, hasconst, **kwargs)
  File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/base/data.py", line 633, in handle_data
    **kwargs)
  File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/base/data.py", line 79, in __init__
    self._handle_constant(hasconst)
  File "/home/souvik/data_analysis/lib/python3.5/site-packages/statsmodels/base/data.py", line 131, in _handle_constant
    ptp_ = self.exog.ptp(axis=0)
TypeError: cannot perform reduce with flexible type

我在某些网站上获得了上述代码段,但无法在我的情况下申请.我在做什么错了?

I got the above code snippet in some website but I am unable to apply in my case.What am I doing wrong?

推荐答案

您的问题在这里:

X = df.index.strftime('%Y-%m-%d')

X因此是一个字符串,因此您不能使用它来拟合回归.您将需要

X is thus a string, so you can't use it to fit a regression. You'll want something like

X = (df.index.astype(np.int64) // 10**9).values它将把您的日期时间转换为Unix秒.

X = (df.index.astype(np.int64) // 10**9).values which will instead convert your datetimes to Unix seconds.

或者,如果您希望对X使用从初始值开始的天数"之类的内容,则可以

Alternatively if you'd rather use something like "days since initial value" for X, you can do

start_date = df.index[0]
X = (df.index - start_date).days.values

无论哪种情况,您都将要打印results.summary()而不是results.

In either case, you'll want to print results.summary() rather than results as well.

这篇关于如何通过对数据趋势进行线性回归来找出斜率值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆