Python简单指数平滑 [英] Python Simple Exponential Smoothing

查看:615
本文介绍了Python简单指数平滑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从www.nasdaq.com下载了TESLA股票;下载CSV文件后,我意识到我需要使用Microsoft Excel 2016转换CSV。然后单击文本到列。标头现在很清楚,它们是:日期,收盘价,成交量,开盘价,最高价,最低价。请在此处查看csv文件。 LinK: https://drive.google.com/open?id=1cirQi47U4uumvA14g6vOmgsXbV-YvS4 / a>

 预览(CSV数据从2017年2月2日到2018年2月2日):

1.日期|关闭|数量打开高|低|
2. 02/02/2018 | 343.75 | 3696157 | 348.44 | 351.95 | 340.51 |
3. 2018年1月2日| 349.25 | 4187440 | 351.00 | 359.66 | 348.63 |

对我来说,挑战是要在每个月的第一个月附近创建一个数据指针尽可能。我过滤了excel文件,这就是我得到的数据。

 -date |关闭
-2018年1月2日| 349.25
-2018年2月1日| 320.53
-01/12/2017 | 306.53
-01/11/2017 | 321.08
-02/10/2017 | 341.53
-01/09/2017 | 355.40
-01/08/2017 | 319.57
-03/07/2017 | 352.62
-01/06/2017 | 340.37
-01/05/2017 | 322.83
-2017年3月4日| 298.52
-01/03/2017 | 250.02
-02/02/2017 | 251.55

如果创建数据点,则需要创建图形。为了通过简单的指数平滑显示原始数据和平滑数据的图形,有时将其称为单指数平滑。

 -x |这是有关使用python-ggplot的时间序列预测的更多信息。 y 
-01/02/2018 | 349.25
-2018年2月1日| 320.53
-01/12/2017 | 306.53
-01/11/2017 | 321.08
-02/10/2017 | 341.53
-01/09/2017 | 355.40
-01/08/2017 | 319.57
-03/07/2017 | 352.62
-01/06/2017 | 340.37
-01/05/2017 | 322.83
-2017年3月4日| 298.52
-2017年1月3日| 250.02
-2017年2月2日| 251.55

我写的python程序是:

 #-*-编码:utf-8-*-


创建于2018年2月3日星期六13:20:28

@author:johannesbambang


导入pd为熊猫
导入matplotlib.pyplot为plt
导入matplotlib.dates为mdates

my_data = pd.read_csv('C:\TESLA指数平滑\TSLA.csv',dayfirst = True,index_col = 0)
my_data.plot()

plt.show()

我的问题是我应该在python程序中改进什么?任何帮助都会很棒。

在Python中使用简单指数平滑。


使用加权平均值进行计算,其中权重随着过去的观察值的增长而呈指数下降,最小的权重与最早的观察值相关: '简单指数平滑返回最后N个值
y_t = a * y_t + a *(1-a)^ 1 * y_t-1 + a *(1-a)^ 2 * y_t-2 + .. + a *(1-a)^ n *
y_t-n'''


def exponential_smoothing(panda_series,alpha_value):
ouput = sum( [alpha_value *(1-alpha_value)** i * x for i,x in
枚举(reversed(panda_series))])
返回输出
panda_series = mydata.y
smoothing_number = exponential_smoothing(panda_series,0.6)#使用a = 0.6或0.5进行选择,可减少均方根误差
estimate_values = testdata.copy()#用测试数据集替换testdata
estimate_values ['SES' ] = smoothing_num ber
error = sqrt(mean_squared_error(testdata.y,estimate_values.SES))
打印(错误)


I downloaded a TESLA stock from www.nasdaq.com; and after I downloaded the CSV file I realized that I need convert the CSV by using Microsoft Excel 2016. I use the Data Tab; and click Text to Columns. The header is clear now, they are: date, close, volume, open, high, low. Please see the csv file here. LinK: https://drive.google.com/open?id=1cirQi47U4uumvA14g6vOmgsXbV-YvS4l

Preview (The CSV data is from 02/02/2017 until 02/02/2018):

 1. date        | close  |  volume  | open   | high   | low   |
 2. 02/02/2018  | 343.75 |  3696157 | 348.44 | 351.95 | 340.51|
 3. 01/02/2018  | 349.25 |  4187440 | 351.00 | 359.66 | 348.63|

The challenge for me is to create a data pointout of each month as close to the first of the month as possible. I filter in the excel file and this is the data what I get.

 - date | close
 - 01/02/2018 | 349.25
 - 02/01/2018 | 320.53
 - 01/12/2017 | 306.53
 - 01/11/2017 | 321.08
 - 02/10/2017 | 341.53
 - 01/09/2017 | 355.40
 - 01/08/2017 | 319.57
 - 03/07/2017 | 352.62
 - 01/06/2017 | 340.37
 - 01/05/2017 | 322.83
 - 03/04/2017 | 298.52
 - 01/03/2017 | 250.02
 - 02/02/2017 | 251.55

If I create a Data Point, it becomes like this which is need to create a graph. To display the graph of the original data and the "smoothed data" with simple exponential smoothing or sometimes it is called single exponential smoothing. This is more about Time Series Forecasting which uses python-ggplot.

 - x | y
 - 01/02/2018 | 349.25
 - 02/01/2018 | 320.53
 - 01/12/2017 | 306.53
 - 01/11/2017 | 321.08
 - 02/10/2017 | 341.53
 - 01/09/2017 | 355.40
 - 01/08/2017 | 319.57
 - 03/07/2017 | 352.62
 - 01/06/2017 | 340.37
 - 01/05/2017 | 322.83
 - 03/04/2017 | 298.52
 - 01/03/2017 | 250.02
 - 02/02/2017 | 251.55

The python program which I wrote is:

# -*- coding: utf-8 -*-

"""
Created on Sat Feb  3 13:20:28 2018

@author: johannesbambang
"""

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

my_data = pd.read_csv('C:\TESLA Exponential Smoothing\TSLA.csv',dayfirst=True,index_col=0)
my_data.plot()

plt.show()

My question is what should I improve in my python program? Any help will be great. Thank you in advance.

解决方案

Use Simple Exponential Smoothing in Python.

Forecasts are calculated using weighted averages where the weights decrease exponentially as observations come from further in the past, the smallest weights are associated with the oldest observations:

'''simple exponential smoothing go back to last N values
 y_t = a * y_t + a * (1-a)^1 * y_t-1 + a * (1-a)^2 * y_t-2 + ... + a*(1-a)^n * 
y_t-n'''


def exponential_smoothing(panda_series, alpha_value):
    ouput=sum([alpha_value * (1 - alpha_value) ** i * x for i, x in 
                enumerate(reversed(panda_series))])
    return ouput
panda_series=mydata.y
smoothing_number=exponential_smoothing(panda_series,0.6) # use a=0.6 or 0.5 your choice, which gives less rms error
estimated_values=testdata.copy() # replace testdata with your test dataset
estimated_values['SES'] = smoothing_number
error=sqrt(mean_squared_error(testdata.y, estimated_values.SES))
print(error)

这篇关于Python简单指数平滑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆