使用 Pandas 识别金融数据中的极值 [英] Identifying Extrema in Financial Data using Pandas

查看:90
本文介绍了使用 Pandas 识别金融数据中的极值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我每天都有标准普尔 500 指数价格和美国国债收益率.最终目标是确定美国国债在标准普尔指数修正期间的图形和数学表现.修正是从最后一个峰值下降一些 %,其中 % 是一个可变参数.

import urllib2, pandas as pd, numpy as np, matplotlib.pyplot as plt, scipy as sp修正 = 0.1 # 定义从峰值下降的百分比以构成市场修正sp_data = urllib2.urlopen('http://real-chart.finance.yahoo.com/table.csv?s=%5EGSPC&a=00&b=3&c=1950&d=00&e=14&f=2016&g=d&ignore=.csv')df1 = pd.read_csv(sp_data)df1 = df1[['日期','关闭']]df1 = df1.rename(columns = {'Close':'S&P_500'})t_bill_data = urllib2.urlopen('http://real-chart.finance.yahoo.com/table.csv?s=%5ETNX&a=00&b=2&c=1962&d=00&e=14&f=2016&g=d&ignore=.csv')df2 = pd.read_csv(t_bill_data)df2 = df2[['日期','关闭']]df2 = df2.rename(columns = {'Close':'T_Bill'})df3 = pd.merge(df1, df2, on='Date', how='outer')df3['Date'] = pd.to_datetime(df3['Date'], format='%Y-%m-%d')df3 = df3.set_index('日期')df3.describe()df3.plot(kind='line',title='S&P 500 vs. 10 yr T-Bill',subplots=True)

如何识别 df 并将其细分为不同的 S&P 修正周期?(允许图表和汇总统计数据专注于独特的时间段.因此我可以确定标准普尔修正和美国国债之间的相关性.)Scipy 有

3.子集和汇总统计

最后,将有两种方法来探索结果数据集:使用 Pandas .groupby() 或直接子集.在这两种情况下,我们都需要退货,而不是价格:

ret_df = pd.DataFrame({'SP500': spx['Close'].pct_change(),'债券':tnx['Close'].pct_change(),熊市":spx_bear})ret_df.groupby('熊市').agg('mean')债券 SP500熊市0 0.000042 0.0004301 -0.002679 -0.003261ret_df[ret_df['熊市'] == 1][['债券','SP500']].corr()债券 SP500债券 1.000000 0.253068SP500 0.253068 1.000000

编辑:

你会多次看到熊"在代码中.原因是我从我的小项目中借用了这个代码来识别熊市"的时期,但是如果你忽略熊市"这个词,这个代码适用于任何更正.和值-20%",这是熊市的定义.

I have daily S&P 500 prices and Treasury yields. The end goal is to determine how Treasuries perform, graphically and mathematically, during corrections in the S&P. A correction is a decline of some % off the last peak, with the % being a mutable parameter.

import urllib2, pandas as pd, numpy as np, matplotlib.pyplot as plt, scipy as sp

correction = 0.1    # define % decline from peak to constitute market correction

sp_data = urllib2.urlopen('http://real-chart.finance.yahoo.com/table.csv?s=%5EGSPC&a=00&b=3&c=1950&d=00&e=14&f=2016&g=d&ignore=.csv')
df1 = pd.read_csv(sp_data)
df1 = df1[['Date','Close']]
df1 = df1.rename(columns = {'Close':'S&P_500'})

t_bill_data = urllib2.urlopen('http://real-chart.finance.yahoo.com/table.csv?s=%5ETNX&a=00&b=2&c=1962&d=00&e=14&f=2016&g=d&ignore=.csv')
df2 = pd.read_csv(t_bill_data)
df2 = df2[['Date','Close']]
df2 = df2.rename(columns = {'Close':'T_Bill'})

df3 = pd.merge(df1, df2, on='Date', how='outer')

df3['Date'] = pd.to_datetime(df3['Date'], format='%Y-%m-%d')
df3 = df3.set_index('Date')

df3.describe()
df3.plot(kind='line',title='S&P 500 vs. 10 yr T-Bill',subplots=True)

How can I identify and subset the df into distinct periods of S&P corrections? (Allowing the graph plot and summary statistics to focus on unique time periods. So I can determine a correlation between S&P corrections and Treasuries.) Scipy has tools for identifying global or local minima and maxima -- is there a pythonic method to tailor these to identify periods of correction?

解决方案

I will answer your question from purely Pandas standpoint (rather than using urlib or numpy) as Pandas was specifically made to address almost any practical question arising in retrieving and munging financial data.

1. How to identify distinct periods of S&P corrections?

Let's define correction as a 20% or more market decline over recent (say 90 days) peak:

import pandas as pd
from pandas_datareader import data
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (15,5)

spx = data.get_data_yahoo('^GSPC', start = '1970-01-01')
tnx = data.get_data_yahoo('^TNX', start = '1970-01-01')

WINDOW = 90
CORRECTION = .2
spx_bear = spx['Close'].rolling(WINDOW).apply(lambda x: x[-1]/x.max() < (1-CORRECTION))

data_df = pd.DataFrame({'SP500': spx['Close'],
                        'Bonds': tnx['Close'],
                        'Bear market': spx_bear})

data_df.tail()


    Bear market Bonds   SP500
Date            
2016-01-11  0   2.158   1923.670044
2016-01-12  0   2.102   1938.680054
2016-01-13  0   2.066   1890.280029
2016-01-14  0   2.098   1921.839966
2016-01-15  0   2.033   1880.329956

You may play with window and correction parameters to obtain different "versions" of corrections.

2. Plotting

plot_df = data_df['2008':'2009']

_, ax = plt.subplots()
ax2 = ax.twinx()

plot_df['Bonds'].plot(ax=ax)
plot_df['Bear market'].plot(ax=ax2, style='r--', ylim=[-.1, 1.1])
ax.set_title('Treasuries Performance during SP500 Corrections');

3. Subsetting and summary statistics

Finally, there will be two ways to explore the resulting dataset: with pandas .groupby() or straightforward subsetting. In both cases we'll need returns, not prices:

ret_df = pd.DataFrame({'SP500': spx['Close'].pct_change(),
                       'Bonds': tnx['Close'].pct_change(),
                       'Bear market': spx_bear})

ret_df.groupby('Bear market').agg('mean')

    Bonds   SP500
Bear market     
0   0.000042    0.000430
1   -0.002679   -0.003261


ret_df[ret_df['Bear market'] == 1][['Bonds','SP500']].corr()
    Bonds   SP500
Bonds   1.000000    0.253068
SP500   0.253068    1.000000

Edit:

you'll see several times "bear" in the code. The reason is that I borrowed this code from my small project to identify periods of "bear markets", but this code is applicable to any correction if you disregard words "bear" and the value "-20%", which are the definition of a bear market.

这篇关于使用 Pandas 识别金融数据中的极值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆