在 SAS 中计算滚动相关性 [英] Calculating rolling correlations in SAS

查看:89
本文介绍了在 SAS 中计算滚动相关性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集

我已将我的数据文件加载到名为 rolling 的 SAS 中.目前,我的代码很简单:

proc corr data = mm.rolling;跑步;

这只是使用整个系列值计算相关矩阵.我对 SAS 很陌生,任何帮助将不胜感激.

解决方案

想想如果你有极大的耐心,你会怎么做.

proc corr data = mm.rolling out =correlation_as_of_01jan2008;其中日期介于 '01jan2007'd 和 '01jan2008'd 之间;跑步;

同样,

proc corr data = mm.rolling out =correlation_as_of_02jan2008;其中日期介于 '02jan2007'd 和 '02jan2008'd 之间;跑步;

幸运的是,您可以使用 SAS 宏编程来实现与此宏所示类似的效果:

%macro scrollingCorrelations(inputDataset=, refDate=);/*首先获取参考日期当天或之后的唯一日期列表*/proc freq 数据 = &inputDataset.无印;where date >="&refDate."d;表日期/出=日期(保持=日期);跑步;/*对于每个日期计算窗口范围是多少,这里使用一年的长度*/数据日期范围(下降=日期);设置日期 end = endOfFilenobs = numDates;格式 toDate fromDate date9.;toDate=日期;fromDate = intnx('year', toDate, -1, 's');调用 symputx(compress("toDate"!!_n_), put(toDate,date9.));调用 symputx(compress("fromDate"!!_n_), put(fromDate, date9.) );/*找到我们需要迭代多少次(numberOfWindows)*/如果 endOfFile 然后做;调用 symputx("numberOfWindows", numDates);结尾;跑步;%do i = 1 %to &numberOfWindows.;/*创建一个临时视图,其中包含传递给PROC CORR的过滤数据*/数据 windowedDataview/view = windowedDataview;设置和输入数据集."&&fromDate&i."d 和 "&&toDate&i."d 之间的日期;下车日期;跑步;/*每次PROC CORR运行的输出数据集将是相关性_DDMMMYYY<开始日期>_DDMMMYY<开始日期>*/proc corr data = windowedDataviewoutp = 相关性_&&fromDate&i.._&&toDate&i.(where=(_type_ = 'CORR'))无印;跑步;%结尾;/*将所有数据集附加到一个表中*/数据 all_correlations;格式从日期9.设置相关性_:indsname = 数据集名称;from = input(substr(datasetname,19,9),date9.);to = input(substr(datasetname,29,9), date9.);跑步;%修正滚动相关性;%rollingCorrelations(inputDataset=rolling, refDate=01JAN2008)

上述宏的最终输出将包含 from &to 标识符,用于标识每个相关矩阵所指的日期范围.运行它并检查结果.

我不认为 excel 可以容纳超过 1500 个标签,所以最好将它保存在一个表中.决赛桌有 81K 行,整个过程在 2.5 分钟内运行.

update: 按 from & 对它们进行排序

proc 排序数据 = ALL_CORRELATIONS;从到;跑步;

I have a data set here.

An excerpt of my data looks like this: (For an enlarged version: http://puu.sh/79NCK.jpg)

(Note: there are no missing values in my dataset)

I wish to calculate the correlation matrix using a rolling window of 1 year. My period starts from 01 Jan 2008. So for example, the correlation between AUT and BEL on 01 Jan 2008 is calculated using the series of values from 01 Jan 2007 to 01 Jan 2008, and likewise for all other pairs. Similarly the correlation between AUT and BEL on 02 Jan 2008 is calculated using the series of values from 02 Jan 2007 to 02 Jan 2008.

Since there will be a different correlation matrix for each day, I want to output each day's correlation matrix into a sheet in excel and name that sheet COV1 (for 01 Jan 2008), COV2 (for 02 Jan 2008), COV3 (for 03 Jan 2008), and so on until COV1566 (for 31 Dec 2013). An excerpt of the output for each sheet is like this: (Note: with the titles included on the top row and first column)

http://puu.sh/79NAy.jpg

I have loaded my datafile into SAS named rolling. For the moment, my code is simply:

proc corr data = mm.rolling;
run;

Which simply calculates the correlation matrix using the entire series of values. I am very new to SAS, any help would be appreciated.

解决方案

Think about how you might do if you had immense amount of patience.

proc corr data = mm.rolling out = correlation_as_of_01jan2008;
where date between '01jan2007'd and '01jan2008'd;
run;

Similarly,

proc corr data = mm.rolling out = correlation_as_of_02jan2008;
where date between '02jan2007'd and '02jan2008'd;
run;

Thankfully you can use SAS macro programming to achieve a similar effect as shown in this macro:

%macro rollingCorrelations(inputDataset=, refDate=);
/*first get a list of unique dates on or after the reference date*/
proc freq data = &inputDataset. noprint;
where date >="&refDate."d;
table date/out = dates(keep = date);
run;


/*for each date calculate what the window range is, here using a year's length*/
data dateRanges(drop = date);
set dates end = endOfFile 
                nobs= numDates;
format toDate fromDate date9.;

toDate=date;
fromDate = intnx('year', toDate, -1, 's');

call symputx(compress("toDate"!!_n_), put(toDate,date9.));
call symputx(compress("fromDate"!!_n_), put(fromDate, date9.) );

/*find how many times(numberOfWindows) we need to iterate through*/
if endOfFile then do;
call symputx("numberOfWindows", numDates);
end;

run;
%do i = 1 %to &numberOfWindows.;
/*create a temporary view which has the filtered data that is passed to PROC CORR*/
data windowedDataview / view = windowedDataview;
set  &inputDataset.;
where date between "&&fromDate&i."d and "&&toDate&i."d;
drop date;
run;
    /*the output dataset from each PROC CORR run will be 
correlation_DDMMMYYY<from date>_DDMMMYY<start date>*/
proc corr data = windowedDataview 
outp = correlations_&&fromDate&i.._&&toDate&i. (where=(_type_ = 'CORR'))

        noprint;
run;

%end;

/*append all datasets into a single table*/
data all_correlations;
format from to date9.;
set correlations_:
     indsname = datasetname
;
from = input(substr(datasetname,19,9),date9.);
to = input(substr(datasetname,29,9), date9.);
run;


%mend rollingCorrelations;
%rollingCorrelations(inputDataset=rolling, refDate=01JAN2008)

The final output from the above macro will have from & to identifier to identify which date range each correlation matrix refers. Run it and examine the results.

I dont think excel can accomodate over 1500 tabs anyway, so best to keep it in a single table. The final table had 81K rows and the whole process ran in 2.5 mins.

update: to sort them by from & to

proc sort data = ALL_CORRELATIONS;
by from to;
run;

这篇关于在 SAS 中计算滚动相关性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆