使用 Pandas 识别金融数据中的极值 [英] Identifying Extrema in Financial Data using Pandas
问题描述
我每天都有标准普尔 500 指数价格和美国国债收益率.最终目标是确定美国国债在标准普尔指数修正期间的图形和数学表现.修正是从最后一个峰值下降一些 %,其中 % 是一个可变参数.
import urllib2, pandas as pd, numpy as np, matplotlib.pyplot as plt, scipy as sp修正 = 0.1 # 定义从峰值下降的百分比以构成市场修正sp_data = urllib2.urlopen('http://real-chart.finance.yahoo.com/table.csv?s=%5EGSPC&a=00&b=3&c=1950&d=00&e=14&f=2016&g=d&ignore=.csv')df1 = pd.read_csv(sp_data)df1 = df1[['日期','关闭']]df1 = df1.rename(columns = {'Close':'S&P_500'})t_bill_data = urllib2.urlopen('http://real-chart.finance.yahoo.com/table.csv?s=%5ETNX&a=00&b=2&c=1962&d=00&e=14&f=2016&g=d&ignore=.csv')df2 = pd.read_csv(t_bill_data)df2 = df2[['日期','关闭']]df2 = df2.rename(columns = {'Close':'T_Bill'})df3 = pd.merge(df1, df2, on='Date', how='outer')df3['Date'] = pd.to_datetime(df3['Date'], format='%Y-%m-%d')df3 = df3.set_index('日期')df3.describe()df3.plot(kind='line',title='S&P 500 vs. 10 yr T-Bill',subplots=True)
如何识别 df 并将其细分为不同的 S&P 修正周期?(允许图表和汇总统计数据专注于独特的时间段.因此我可以确定标准普尔修正和美国国债之间的相关性.)Scipy 有
3.子集和汇总统计
最后,将有两种方法来探索结果数据集:使用 Pandas .groupby()
或直接子集.在这两种情况下,我们都需要退货,而不是价格:
ret_df = pd.DataFrame({'SP500': spx['Close'].pct_change(),'债券':tnx['Close'].pct_change(),熊市":spx_bear})ret_df.groupby('熊市').agg('mean')债券 SP500熊市0 0.000042 0.0004301 -0.002679 -0.003261ret_df[ret_df['熊市'] == 1][['债券','SP500']].corr()债券 SP500债券 1.000000 0.253068SP500 0.253068 1.000000
编辑:
你会多次看到熊"在代码中.原因是我从我的小项目中借用了这个代码来识别熊市"的时期,但是如果你忽略熊市"这个词,这个代码适用于任何更正.和值-20%",这是熊市的定义.
I have daily S&P 500 prices and Treasury yields. The end goal is to determine how Treasuries perform, graphically and mathematically, during corrections in the S&P. A correction is a decline of some % off the last peak, with the % being a mutable parameter.
import urllib2, pandas as pd, numpy as np, matplotlib.pyplot as plt, scipy as sp
correction = 0.1 # define % decline from peak to constitute market correction
sp_data = urllib2.urlopen('http://real-chart.finance.yahoo.com/table.csv?s=%5EGSPC&a=00&b=3&c=1950&d=00&e=14&f=2016&g=d&ignore=.csv')
df1 = pd.read_csv(sp_data)
df1 = df1[['Date','Close']]
df1 = df1.rename(columns = {'Close':'S&P_500'})
t_bill_data = urllib2.urlopen('http://real-chart.finance.yahoo.com/table.csv?s=%5ETNX&a=00&b=2&c=1962&d=00&e=14&f=2016&g=d&ignore=.csv')
df2 = pd.read_csv(t_bill_data)
df2 = df2[['Date','Close']]
df2 = df2.rename(columns = {'Close':'T_Bill'})
df3 = pd.merge(df1, df2, on='Date', how='outer')
df3['Date'] = pd.to_datetime(df3['Date'], format='%Y-%m-%d')
df3 = df3.set_index('Date')
df3.describe()
df3.plot(kind='line',title='S&P 500 vs. 10 yr T-Bill',subplots=True)
How can I identify and subset the df into distinct periods of S&P corrections? (Allowing the graph plot and summary statistics to focus on unique time periods. So I can determine a correlation between S&P corrections and Treasuries.) Scipy has tools for identifying global or local minima and maxima -- is there a pythonic method to tailor these to identify periods of correction?
I will answer your question from purely Pandas standpoint (rather than using urlib or numpy) as Pandas was specifically made to address almost any practical question arising in retrieving and munging financial data.
1. How to identify distinct periods of S&P corrections?
Let's define correction as a 20% or more market decline over recent (say 90 days) peak:
import pandas as pd
from pandas_datareader import data
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (15,5)
spx = data.get_data_yahoo('^GSPC', start = '1970-01-01')
tnx = data.get_data_yahoo('^TNX', start = '1970-01-01')
WINDOW = 90
CORRECTION = .2
spx_bear = spx['Close'].rolling(WINDOW).apply(lambda x: x[-1]/x.max() < (1-CORRECTION))
data_df = pd.DataFrame({'SP500': spx['Close'],
'Bonds': tnx['Close'],
'Bear market': spx_bear})
data_df.tail()
Bear market Bonds SP500
Date
2016-01-11 0 2.158 1923.670044
2016-01-12 0 2.102 1938.680054
2016-01-13 0 2.066 1890.280029
2016-01-14 0 2.098 1921.839966
2016-01-15 0 2.033 1880.329956
You may play with window
and correction
parameters to obtain different "versions" of corrections.
2. Plotting
plot_df = data_df['2008':'2009']
_, ax = plt.subplots()
ax2 = ax.twinx()
plot_df['Bonds'].plot(ax=ax)
plot_df['Bear market'].plot(ax=ax2, style='r--', ylim=[-.1, 1.1])
ax.set_title('Treasuries Performance during SP500 Corrections');
3. Subsetting and summary statistics
Finally, there will be two ways to explore the resulting dataset: with pandas .groupby()
or straightforward subsetting. In both cases we'll need returns, not prices:
ret_df = pd.DataFrame({'SP500': spx['Close'].pct_change(),
'Bonds': tnx['Close'].pct_change(),
'Bear market': spx_bear})
ret_df.groupby('Bear market').agg('mean')
Bonds SP500
Bear market
0 0.000042 0.000430
1 -0.002679 -0.003261
ret_df[ret_df['Bear market'] == 1][['Bonds','SP500']].corr()
Bonds SP500
Bonds 1.000000 0.253068
SP500 0.253068 1.000000
Edit:
you'll see several times "bear" in the code. The reason is that I borrowed this code from my small project to identify periods of "bear markets", but this code is applicable to any correction if you disregard words "bear" and the value "-20%", which are the definition of a bear market.
这篇关于使用 Pandas 识别金融数据中的极值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!