将 SAS 中的三次样条有效地拟合到特定的对象网格 [英] Efficiently fitting cubic splines in SAS to specific grid of objects

查看:46
本文介绍了将 SAS 中的三次样条有效地拟合到特定的对象网格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含以下变量的数据集 mydat:

I have a dataset mydat with the following variables:

 MNES    IV
 0.84  0.40
 0.89  0.34
 0.91  0.31
 0.93  0.29
 0.95  0.26
 0.98  0.23
 0.99  0.22
 1.00  0.22
 1.02  0.20
 1.04  0.18
 1.07  0.18

我需要为这些元素拟合三次样条,其中 MNES 是对象 (X) 而 IV 是图像 (Y).

And I need to fit cubic splines to these elements, where MNES is the object (X) and IV is the image (Y).

我已经通过 PROC IML 成功完成了我需要的事情,但恐怕这不是最有效的解决方案.

I have successfully accomplished what I need through PROC IML but I am afraid this is not the most efficient solution.

具体来说,我想要的输出数据集是:

Specifically, my intended output dataset is:

 mnes    iv
 0.333  0.40
 0.332  0.40  <- for mnes out of sample MNES range, copy first IV;
 0.336  0.40
 ...    ...
 0.834  0.40
 0.837  0.40
 0.840  0.40
 0.842  INTERPOLATION
 0.845  INTERPOLATION
 0.848  INTERPOLATION
 ...
 1.066  INTERPOLATION
 1.069  INTERPOLATION 
 1.072  INTERPOLATION
 1.074  0.18
 1.077  0.18  <- for mnes out of sample MNES range, copy last IV;
 1.080  0.18
 ...    ...
 3.000  0.18

必要的细节如下:

  • 对于 MNES,我总是有 1001 分,范围从 0.(3) 到 3(因此,每一步都是 (3-1/3)/1000).
  • IV 的插值只能用于最小和最大 MNES 之间的点.
  • 对于MNES大于样本中最大MNES的点,IV应等于IV 最大 MNES 和最小 MNES 也是如此(它总是按 MNES 排序).
  • I always have 1001 points for MNES, ranging from 0.(3) to 3 (thus, each step is (3-1/3)/1000).
  • The interpolation for IV should only be used for the points between the minimum and maximum MNES.
  • For the points where MNES is greater than the maximum MNES in the sample, IV should be equal to the IV of the maximum MNES and likewise for the minimum MNES (it is always sorted by MNES).

我担心效率是因为我必须解决这个问题大约 200 万次,而现在(下面的代码,使用 PROC IML)处理 10 万个不同的输入数据集需要大约 5 个小时.

My worry for efficiency is due to the fact that I have to solve this problem roughly 2 million times and right now it (the code below, using PROC IML) takes roughly 5 hours for 100k different input datasets.

我的问题是:如果我希望在给定输入数据集(例如上述数据集)的情况下拟合三次样条并将其输出到特定对象网格,我有哪些选择?什么解决方案最有效?

My question is: What alternatives do I have if I wish to fit cubic splines given an input data set such as the one above and output it to a specific grid of objects? And what solution would be the most efficient?

  • 使用 PROC IML,我可以使用 splinev 函数,但我担心使用 PROC IML 不是最有效的方法;
  • 使用 PROC EXPAND,鉴于这不是一个时间序列,它似乎不够用.另外,我不知道如何通过 PROC EXPAND 指定我需要的对象网格;
  • 使用PROC TRANSREG,我不明白如何将数据集输入到结中,也不明白它是否会输出具有相应插值的数据集;
  • 使用 MSPLINT 函数似乎可行,但我不知道如何将数据集输入到其参数中.
  • With PROC IML I can do exactly this with the splinev function, but I am concerned that using PROC IML is not the most efficient way;
  • With PROC EXPAND, given that this is not a time series, it does not seem adequate. Additionally, I do not know how to specify the grid of objects which I need through PROC EXPAND;
  • With PROC TRANSREG, I do not understand how to input a dataset into the knots and I do not understand whether it will output a dataset with the corresponding interpolation;
  • With the MSPLINT function, it seems doable but I do not know how to input a data set to its arguments.

我附上了我在下面为此目的使用的代码以及我在做什么的解释.阅读下面的内容不是回答问题所必需的,但对于使用 PROC IML 解决此类问题或想要更好地理解我所说的内容的人来说可能很有用.

I have attached the code I am using below for this purpose and an explanation of what I am doing. Reading what is below is not necessary for answering the question but it could be useful for someone solving this sort of problem with PROC IML or wanting a better understanding of what I am saying.

我正在复制一种方法(Buss 和 Vilkov (2012)),其中将三次样条应用于这些元素,其中 MNES 是对象 (X) 和 IV 是图像 (Y).

I am replicating a methodology (Buss and Vilkov (2012)) which, among other things, applies cubic splines to these elements, where MNES is the object (X) and IVis the image (Y).

以下代码主要基于 Vilkov 为 Buss 和 Vilkov (2012) 编写的无模型隐含波动率 (MFIV) MATLAB 代码,在他的网站上提供.

The following code is heavily based on the Model Free Implied Volatility (MFIV) MATLAB code by Vilkov for Buss and Vilkov (2012), available on his website.

插值是一种通过计算 OTM 看跌期权和看涨期权价格来计算风险中性衡量下的股票收益波动率数字的方法.我将其用于我的硕士论文.此外,由于我的 PROC IML 版本没有用于 Black-Scholes 期权定价的函数,因此我定义了自己的函数.

The interpolation is a means to calculate a figure for stock return volatility under the risk-neutral measure, by computing OTM put and call prices. I am using this for the purpose of my master thesis. Additionally, since my version of PROC IML does not have functions for Black-Scholes option pricing, I defined my own.

proc iml;
    * Define BlackScholes call and put function;
    * Built-in not available in SAS/IML 9.3;
    * Reference http://www.lexjansen.com/wuss/1999/WUSS99039.pdf ;

    start blackcall(x,t,s,r,v,d);
        d1 = (log(s/x) + ((r-d) + 0.5#(v##2)) # t) / (v # sqrt(t));
        d2 = d1 - v # sqrt(t);
        bcall = s # exp(-d*t) # probnorm(d1) - x # exp(-r*t) # probnorm(d2);
        return (bcall);
    finish blackcall;

    start blackput(x,t,s,r,v,d);
        d1 = (log(s/x) + ((r-d) + 0.5#(v##2)) # t) / (v # sqrt(t));
        d2 = d1 - v # sqrt(t);
        bput = -s # exp(-d*t) # probnorm(-d1) + x # exp(-r*t) # probnorm(-d2);
        return (bput);
    finish blackput;

    store module=(blackcall blackput);
quit;

proc iml;
    * Specify necessary input parameters;
    currdate = "&currdate"d;
    currpermno = &currpermno;
    currsecid = &currsecid;
    rate = &currrate / 100;
    mat = &currdays / 365;
    * Use inputed dataset and convert to matrix;
    use optday;
    read all var{mnes impl_volatility};
    mydata = mnes || impl_volatility;

    * Load BlackScholes call and Put function;
    load module=(blackcall blackput);

    * Define parameters;
    k = 2;
    m = 500;

    * Define auxiliary variables according to Buss and Vilkov;
    u = (1+k)##(1/m);
    a = 2 * (u-1);

    * Define moneyness (ki) and implied volatility (vi) grids;
    mi = (-m:m);
    mi = mi`;
    ki = u##mi;

    * Preallocation of vi with 2*m+1 ones (1001 in the base case);
    vi = J(2*m+1,1,1);

    * Define IV below minimum MNESS equal to the IV of the minimum MNESS;
    h = loc(ki<=mydata[1,1]);
    vi[h,1] = mydata[1,2];

    * Define IV above maximum MNESS equal to the IV of the maximum MNESS;
    h = loc(ki>=mydata[nrow(mydata),1]);
    vi[h,1] = mydata[nrow(mydata),2];

    * Define MNES grid where there are IV from data;
    * (equal to where ki still has ones resulting from the preallocation);
    grid = ki[loc(vi=1),];

    * Call splinec to interpolate based on available data and obtain coefficients;
    * Use coefficients to create spline on grid and save on smoothFit;
    * Save smoothFit in correct vi elements;
    call splinec(fitted,coeff,endSlopes,mydata);
    smoothFit = splinev(coeff,grid);
    vi[loc(vi=1),1] = smoothFit[,2];

    * Define elements of mi corresponding to OTM calls (MNES >=1) and OTM puts (MNES <1); 
    ic = mi[loc(ki>=1)];
    ip = mi[loc(ki<1)];

    * Calculate call and put prices based on call and put module;
    calls = blackcall(ki[loc(ki>=1),1],mat,1,rate,vi[loc(ki>=1),1],0);
    puts = blackput(ki[loc(ki<1),1],mat,1,rate,vi[loc(ki<1),1],0);

    * Complete volatility calculation based on Buss and Vilkov;
    b1 = sum((1-(log(1+k)/m)#ic)#calls/u##ic);
    b2 = sum((1-(log(1+k)/m)#ip)#puts/u##ip);
    stddev = sqrt(a*(b1+b2)/mat);

    * Append to voldata dataset;
    edit voldata;
    append var{currdate currsecid currpermno mat stddev};
    close voldata;
quit;

推荐答案

好的.我将针对 2 个数据集执行此操作,以帮助您解决拥有一堆数据的事实.您将不得不针对您的输入进行修改,但这应该会给您带来更好的性能.

Ok. I'm going to do this for 2 data sets to help you with the fact you have a bunch. You will have to modify for your inputs, but this should give you better performance.

  1. 创建一些输入
  2. 从每个输入数据集中获取第一个和最后一个值.
  3. 创建一个包含所有 MNES 值的列表.
  4. 将每个输入合并到 MNES 列表并设置上限值和下限值.
  5. 将输入附加在一起
  6. 使用 BY 语句运行 PROC EXPAND 以单次传递所有输入值并创建样条.

诀窍是欺骗"EXPAND 使其认为 MNES 是每日时间序列.我通过将其设为整数来实现这一点——日期值是 SAS 幕后的整数.在没有任何间隙的情况下,ETS 程序将采用每日"频率.

The trick is to "trick" EXPAND into thinking MNES is a Daily timeseries. I do this by making it an integer -- date values are integers behind the scenes in SAS. With no gaps, ETS Procedures will assume a "daily" frequency.

完成此操作后,运行数据步骤以调用 Black-Scholes(BLKSHPTPRC、BLKSHCLPRC)函数并完成分析.

After this is done, run a Data Step to call the Black-Scholes (BLKSHPTPRC, BLKSHCLPRC) functions and complete your analysis.

/*Sample Data*/
data input1;
input MNES    IV;
/*Make MNES and integer*/
MNES = MNES * 1000;
datalines;
 0.84  0.40
 0.89  0.34
 0.91  0.31
 0.93  0.29
 0.95  0.26
 0.98  0.23
 0.99  0.22
 1.00  0.22
 1.02  0.20
 1.04  0.18
 1.07  0.18
 ;
run;

data input2;
input MNES    IV;
MNES = MNES * 1000;
datalines;
 0.80  0.40
 0.9  0.34
 0.91  0.31
 0.93  0.29
 0.95  0.26
 0.98  0.23
 1.02  0.19
 1.04  0.18
 1.07  0.16
 ;
run;

/*Get the first and last values from the input data*/
data _null_;
set input1 end=last;
if _n_ = 1 then do;
    call symput("first1",mnes);
    call symput("first1_v",iv);
end;
if last then do;
    call symput("last1",mnes);
    call symput("last1_v",iv);
end;
run;

data _null_;
set input2 end=last;
if _n_ = 1 then do;
    call symput("first2",mnes);
    call symput("first2_v",iv);
end;
if last then do;
    call symput("last2",mnes);
    call symput("last2_v",iv);
end;
run;

/*A list of the MNES values*/
data points;
do mnes=333 to 3000;
    output;
end;
run;

/*Join Inputs to the values and set the lower and upper values*/
data input1;
merge points input1;
by mnes;
if mnes < &first1 then
    iv = &first1_v;
if mnes > &last1 then
    iv = &last1_v;

run;
data input2;
merge points input2;
by mnes;
if mnes < &first2 then
    iv = &first2_v;
if mnes > &last2 then
    iv = &last2_v;

run;

/*Append the data sets together, keep a value 
  so you can tell them apart*/
data toSpline;
set input1(in=ds1)
    input2(in=ds2);
if ds1 then
    Set=1;
else if ds2 then
    Set=2;
run;

/*PROC Expand for the spline.  The integer values
  for MNES makes it think these are "daily" data*/
proc expand data=toSpline out=outSpline method=spline;
by set;
id mnes;
run;

这篇关于将 SAS 中的三次样条有效地拟合到特定的对象网格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆