R 特殊数据框 [英] R special data frame
问题描述
我在昨天在这篇文章中提出的问题之后提出一个问题:用于变量选择的随机森林.
I'm asking a question follwing the one I asked yesterday in this post : Random Forests for Variables selection.
我设法找出每个季度最重要的技术交易规则.我已经构建了一个数据框来放置这些 TTR 的名称.这是它,我有一个季度的专栏.
I managed to find out for each quarter the most significant technical trading rules. I've built a data frame to put the names of these TTR. Here is it, I've got with one column for quarter.
1 2 3 4 5 6 7 8 9 10 11
1 RSI2 RSI3 RSI2 RSI10 RSI2 RSI2 RSI2 RSI2 RSI2 RSI2 RSI2
2 RSI3 RSI4 RSI3 RSI20 RSI3 RSI3 RSI3 RSI4 RSI4 RSI3 RSI3
3 RSI4 RSI5 RSI4 EMA5 RSI4 RSI4 RSI5 RSI5 RSI5 RSI4 RSI4
4 RSI5 RSI10 RSI5 EMA20 RSI5 RSI5 RSI10 EMA5 RSI10 RSI5 RSI5
5 RSI10 RSI20 RSI10 EMA60 SMA5 RSI10 RSI20 EMA20 RSI20 RSI10 RSI10
6 SMA20 SMA60 RSI20 SMI atr RSI20 SMA60 EMA60 SMA5 RSI20 SMA5
7 SMA60 pctB SMA20 ADX pctB EMA5 atr atr SMA60 atr SMA20
8 atr calcs.1 pctB pctB macd EMA20 pctB ADX pctB ADX EMA20
9 pctB <NA> <NA> macd myVolat EMA60 <NA> pctB macd pctB EMA60
10 myChaikinVol <NA> <NA> signal calcs.1 pctB <NA> macd signal myVolat ADX
11 myVolat <NA> <NA> calcs <NA> macd <NA> signal mySAR calcs.1 pctB
12 calcs <NA> <NA> <NA> <NA> <NA> <NA> myVolat myVolat <NA> myChaikinVol
13 <NA> <NA> <NA> <NA> <NA> <NA> <NA> calcs.1 <NA> <NA> myVolat
14 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> calcs
我添加了 NA
来处理不同长度的行.
I've added NA
to cope with the differing length of rows.
现在,我想回到我的数据集,它看起来像这样:
Now, I would like to come back to my dataset which looks like that :
daily.returns RSI2 RSI3 RSI4 RSI5 RSI10 RSI20 SMA5 SMA20 SMA60 EMA5 EMA20 EMA60 atr SMI ADX oscillator pctB macd signal myChaikinVol mySAR myVolat calcs calcs.1
2009-01-07 -0.015587635 97.964071 92.62210 87.21605 82.40040 66.95642 55.19221 19720.64 18655.29 17758.68 2556.777 2556.777 2556.777 82.06602 27.52145 17.31637 85 0.87092366 0.5930649 -0.220581024 -0.3211637 2369.876 0.2325009 0.3169638 0.2801128
2009-01-08 -0.008700162 43.766573 58.62387 62.97794 64.03382 60.23197 52.99739 19756.44 18666.60 17754.07 2566.499 2566.499 2566.499 80.33416 29.12141 16.86914 85 0.72197937 0.8929854 0.002132269 -0.3183377 2385.210 0.2201065 0.3169831 0.2654092
2009-01-09 -0.011980596 27.182247 44.97072 52.29336 55.50633 56.74068 51.80171 19776.92 18674.31 17750.34 2523.372 2523.372 2523.372 78.65886 29.37878 15.90677 85 0.67025741 0.9349831 0.188702427 -0.2613410 2403.582 0.2245705 0.3119865 0.2608195
2009-01-12 -0.014061295 13.371347 30.46561 39.97055 45.24210 52.16207 50.17764 19788.02 18683.05 17748.76 2524.466 2524.466 2524.466 78.58966 28.17871 14.80066 85 0.49082443 0.9958785 0.350137644 -0.2065359 2420.117 0.2217528 0.3128203 0.2615878
2009-01-13 -0.016693272 6.141462 19.52298 29.30404 35.68593 47.25383 48.32987 19772.25 18693.01 17749.35 2488.165 2488.165 2488.165 76.08326 25.34705 13.96936 80 0.26923307 0.8855971 0.457229531 -0.1845331 2434.998 0.2223591 0.3103439 0.2609330
2009-01-14 -0.047918393 2.712386 11.97834 20.69541 27.26891 42.10718 46.23469 19747.87 18694.16 17742.88 2449.353 2449.353 2449.353 75.42231 20.65686 13.99099 60 -0.01023467 0.6624063 0.498264880 -0.1131268 2445.040 0.2290943 0.3094655 0.2644883
我想做的是在 TTR 不重要的期间放置一个 NA
.例如,如果 RSI2 TTR 在第一季度结果不显着,我想用 NA
s 替换数值,但如果 RSI2 在第 5 季度显着,我想保留数值.
What I would like to do is put an NA
during the periods when a TTR is not significant. For example if the RSI2 TTR turns out not to be significant during the first quarter I would like to replace the numerical values by NA
s , but if the RSI2 is significant during the 5th quarter I would like to keep the numerical values.
最后,我应该得到一个尺寸与初始数据框相同的数据框.
At the end, I should get a data frame which dimensions are the same as the intial data frame.
有什么想法吗?谢谢!
推荐答案
首先,您应该将规则存储在列表中,而不是 data.frame 中.这使您不必用 NA 填充每个规则列表"以使其长度相同,并且还允许您使用 lapply
处理数据.
First of all, you should be storing your rules in a list, rather than a data.frame. This saves your from having to pad each "rule list" with NAs to make them the same length, and also allows you to process your data with lapply
.
由于你没有提供任何数据,我编了一些:
Since you didn't provide any data, I made some up:
#Load data
set.seed(42)
library(quantmod)
getSymbols('SPY')
SPY <- adjustOHLC(SPY)
dat <- dailyReturn(Cl(SPY))
#Add some TTRs
for (rule in c('RSI', 'SMA')){
for (n in c(5, 10, 15, 20, 25)){
newvar <- paste(rule, n, sep='_')
FUN <- get(rule)
dat <- cbind(dat, FUN(dat[,1], n=n))
names(dat)[length(names(dat))] <- newvar
}
}
dat <- na.omit(dat)
rulenames <- names(dat)[-1]
请注意,这是一个 xts
对象,而不是 data.frame.这很重要,因为它以 Date
格式保存索引,而不是作为字符向量:
Note that this is an xts
object, not a data.frame. This is important, as it keeps the index in the Date
format, rather than as a character vector:
> dat[1:5, 1:5]
daily.returns RSI_5 RSI_10 RSI_15 RSI_20
2007-02-08 -0.001308450 40.06379 46.99824 48.59484 49.11738
2007-02-09 -0.007447249 26.65296 40.34267 44.35689 46.10753
2007-02-12 -0.003404196 42.49883 45.94447 47.58264 48.30373
2007-02-13 0.008434995 67.89045 58.59450 55.64932 54.07276
2007-02-14 0.006567123 62.45177 56.28547 54.23836 53.08886
我还编了一些 TTR 供每年使用
I also made up some TTRs to use for each year
#Make a list of rules for each year
library(lubridate)
dat$Year <- year(index(dat))
uniqueYear <- sort(unique(dat$Year))
rulesList <- lapply(uniqueYear, function(x) rulenames[runif(length(rulenames))>.5])
names(rulesList) <- uniqueYear
请注意,我的 ruleList 实际上是一个列表:
Note that my ruleList is literally a list:
> rulesList
$`2007`
[1] "RSI_5" "RSI_10" "RSI_20" "RSI_25" "SMA_5" "SMA_10" "SMA_20" "SMA_25"
$`2008`
[1] "RSI_10" "RSI_15" "SMA_5" "SMA_10" "SMA_25"
$`2009`
[1] "RSI_5" "RSI_15" "RSI_20" "SMA_5" "SMA_15" "SMA_25"
$`2010`
[1] "RSI_5" "RSI_10" "RSI_20" "SMA_5" "SMA_20" "SMA_25"
$`2011`
[1] "RSI_20" "SMA_5" "SMA_10" "SMA_15" "SMA_20" "SMA_25"
$`2012`
[1] "RSI_20" "SMA_5" "SMA_10" "SMA_25"
现在只需循环遍历每一年,然后将 dat
对象子集化为适当的行(年)和列(TTR):
Now it's simply a matter of looping through each year, and subsetting the dat
object to the proper rows (year) and columns (TTRs):
#Apply the rules to each data.frame
data.by.year <- lapply(uniqueYear, function(year){
rule_subset <- rulesList[[as.character(year)]]
data_subset <- dat[dat$Year==year, rule_subset]
})
names(data.by.year) <- uniqueYear
data.by.year
是一个列表(长度为 6),其中每个元素代表 1 年的数据,以及选定的 TTR.
The data.by.year
is a list (of length 6), where each element represent 1 year's worth of data, with the selected TTRs.
> str(data.by.year[[1]])
An ‘xts’ object from 2007-02-08 to 2007-12-31 containing:
Data: num [1:226, 1:8] 40.1 26.7 42.5 67.9 62.5 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:8] "RSI_5" "RSI_10" "RSI_20" "RSI_25" ...
Indexed by objects of class: [Date] TZ:
xts Attributes:
List of 3
$ tclass : chr "Date"
$ tzone : chr ""
$ na.action:Class 'omit' atomic [1:25] 1 2 3 4 5 6 7 8 9 10 ...
.. ..- attr(*, "index")= num [1:25] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 ...
>
这篇关于R 特殊数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!