R 特殊数据框 [英] R special data frame

查看:43
本文介绍了R 特殊数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在昨天在这篇文章中提出的问题之后提出一个问题:用于变量选择的随机森林.

I'm asking a question follwing the one I asked yesterday in this post : Random Forests for Variables selection.

我设法找出每个季度最重要的技术交易规则.我已经构建了一个数据框来放置这些 TTR 的名称.这是它,我有一个季度的专栏.

I managed to find out for each quarter the most significant technical trading rules. I've built a data frame to put the names of these TTR. Here is it, I've got with one column for quarter.

              1       2     3      4       5     6     7       8       9      10           11
1          RSI2    RSI3  RSI2  RSI10    RSI2  RSI2  RSI2    RSI2    RSI2    RSI2         RSI2
2          RSI3    RSI4  RSI3  RSI20    RSI3  RSI3  RSI3    RSI4    RSI4    RSI3         RSI3
3          RSI4    RSI5  RSI4   EMA5    RSI4  RSI4  RSI5    RSI5    RSI5    RSI4         RSI4
4          RSI5   RSI10  RSI5  EMA20    RSI5  RSI5 RSI10    EMA5   RSI10    RSI5         RSI5
5         RSI10   RSI20 RSI10  EMA60    SMA5 RSI10 RSI20   EMA20   RSI20   RSI10        RSI10
6         SMA20   SMA60 RSI20    SMI     atr RSI20 SMA60   EMA60    SMA5   RSI20         SMA5
7         SMA60    pctB SMA20    ADX    pctB  EMA5   atr     atr   SMA60     atr        SMA20
8           atr calcs.1  pctB   pctB    macd EMA20  pctB     ADX    pctB     ADX        EMA20
9          pctB    <NA>  <NA>   macd myVolat EMA60  <NA>    pctB    macd    pctB        EMA60
10 myChaikinVol    <NA>  <NA> signal calcs.1  pctB  <NA>    macd  signal myVolat          ADX
11      myVolat    <NA>  <NA>  calcs    <NA>  macd  <NA>  signal   mySAR calcs.1         pctB
12        calcs    <NA>  <NA>   <NA>    <NA>  <NA>  <NA> myVolat myVolat    <NA> myChaikinVol
13         <NA>    <NA>  <NA>   <NA>    <NA>  <NA>  <NA> calcs.1    <NA>    <NA>      myVolat
14         <NA>    <NA>  <NA>   <NA>    <NA>  <NA>  <NA>    <NA>    <NA>    <NA>        calcs

我添加了 NA 来处理不同长度的行.

I've added NAto cope with the differing length of rows.

现在,我想回到我的数据集,它看起来像这样:

Now, I would like to come back to my dataset which looks like that :

           daily.returns      RSI2     RSI3     RSI4     RSI5    RSI10    RSI20     SMA5    SMA20    SMA60     EMA5    EMA20    EMA60      atr      SMI      ADX oscillator        pctB      macd       signal myChaikinVol    mySAR   myVolat     calcs   calcs.1
2009-01-07  -0.015587635 97.964071 92.62210 87.21605 82.40040 66.95642 55.19221 19720.64 18655.29 17758.68 2556.777 2556.777 2556.777 82.06602 27.52145 17.31637         85  0.87092366 0.5930649 -0.220581024   -0.3211637 2369.876 0.2325009 0.3169638 0.2801128
2009-01-08  -0.008700162 43.766573 58.62387 62.97794 64.03382 60.23197 52.99739 19756.44 18666.60 17754.07 2566.499 2566.499 2566.499 80.33416 29.12141 16.86914         85  0.72197937 0.8929854  0.002132269   -0.3183377 2385.210 0.2201065 0.3169831 0.2654092
2009-01-09  -0.011980596 27.182247 44.97072 52.29336 55.50633 56.74068 51.80171 19776.92 18674.31 17750.34 2523.372 2523.372 2523.372 78.65886 29.37878 15.90677         85  0.67025741 0.9349831  0.188702427   -0.2613410 2403.582 0.2245705 0.3119865 0.2608195
2009-01-12  -0.014061295 13.371347 30.46561 39.97055 45.24210 52.16207 50.17764 19788.02 18683.05 17748.76 2524.466 2524.466 2524.466 78.58966 28.17871 14.80066         85  0.49082443 0.9958785  0.350137644   -0.2065359 2420.117 0.2217528 0.3128203 0.2615878
2009-01-13  -0.016693272  6.141462 19.52298 29.30404 35.68593 47.25383 48.32987 19772.25 18693.01 17749.35 2488.165 2488.165 2488.165 76.08326 25.34705 13.96936         80  0.26923307 0.8855971  0.457229531   -0.1845331 2434.998 0.2223591 0.3103439 0.2609330
2009-01-14  -0.047918393  2.712386 11.97834 20.69541 27.26891 42.10718 46.23469 19747.87 18694.16 17742.88 2449.353 2449.353 2449.353 75.42231 20.65686 13.99099         60 -0.01023467 0.6624063  0.498264880   -0.1131268 2445.040 0.2290943 0.3094655 0.2644883

我想做的是在 TTR 不重要的期间放置一个 NA.例如,如果 RSI2 TTR 在第一季度结果不显着,我想用 NAs 替换数值,但如果 R​​SI2 在第 5 季度显着,我想保留数值.

What I would like to do is put an NAduring the periods when a TTR is not significant. For example if the RSI2 TTR turns out not to be significant during the first quarter I would like to replace the numerical values by NAs , but if the RSI2 is significant during the 5th quarter I would like to keep the numerical values.

最后,我应该得到一个尺寸与初始数据框相同的数据框.

At the end, I should get a data frame which dimensions are the same as the intial data frame.

有什么想法吗?谢谢!

推荐答案

首先,您应该将规则存储在列表中,而不是 data.frame 中.这使您不必用 NA 填充每个规则列表"以使其长度相同,并且还允许您使用 lapply 处理数据.

First of all, you should be storing your rules in a list, rather than a data.frame. This saves your from having to pad each "rule list" with NAs to make them the same length, and also allows you to process your data with lapply.

由于你没有提供任何数据,我编了一些:

Since you didn't provide any data, I made some up:

#Load data
set.seed(42)
library(quantmod)
getSymbols('SPY')
SPY <- adjustOHLC(SPY)
dat <- dailyReturn(Cl(SPY))

#Add some TTRs
for (rule in c('RSI', 'SMA')){
  for (n in c(5, 10, 15, 20, 25)){
    newvar <- paste(rule, n, sep='_')
    FUN <- get(rule)
    dat <- cbind(dat, FUN(dat[,1], n=n))
    names(dat)[length(names(dat))] <- newvar
  }
}
dat <- na.omit(dat)
rulenames <- names(dat)[-1]

请注意,这是一个 xts 对象,而不是 data.frame.这很重要,因为它以 Date 格式保存索引,而不是作为字符向量:

Note that this is an xts object, not a data.frame. This is important, as it keeps the index in the Date format, rather than as a character vector:

> dat[1:5, 1:5]
           daily.returns    RSI_5   RSI_10   RSI_15   RSI_20
2007-02-08  -0.001308450 40.06379 46.99824 48.59484 49.11738
2007-02-09  -0.007447249 26.65296 40.34267 44.35689 46.10753
2007-02-12  -0.003404196 42.49883 45.94447 47.58264 48.30373
2007-02-13   0.008434995 67.89045 58.59450 55.64932 54.07276
2007-02-14   0.006567123 62.45177 56.28547 54.23836 53.08886

我还编了一些 TTR 供每年使用

I also made up some TTRs to use for each year

#Make a list of rules for each year
library(lubridate)
dat$Year <- year(index(dat))
uniqueYear <- sort(unique(dat$Year))
rulesList <- lapply(uniqueYear, function(x) rulenames[runif(length(rulenames))>.5])
names(rulesList) <- uniqueYear

请注意,我的 ruleList 实际上是一个列表:

Note that my ruleList is literally a list:

> rulesList
$`2007`
[1] "RSI_5"  "RSI_10" "RSI_20" "RSI_25" "SMA_5"  "SMA_10" "SMA_20" "SMA_25"

$`2008`
[1] "RSI_10" "RSI_15" "SMA_5"  "SMA_10" "SMA_25"

$`2009`
[1] "RSI_5"  "RSI_15" "RSI_20" "SMA_5"  "SMA_15" "SMA_25"

$`2010`
[1] "RSI_5"  "RSI_10" "RSI_20" "SMA_5"  "SMA_20" "SMA_25"

$`2011`
[1] "RSI_20" "SMA_5"  "SMA_10" "SMA_15" "SMA_20" "SMA_25"

$`2012`
[1] "RSI_20" "SMA_5"  "SMA_10" "SMA_25"

现在只需循环遍历每一年,然后将 dat 对象子集化为适当的行(年)和列(TTR):

Now it's simply a matter of looping through each year, and subsetting the dat object to the proper rows (year) and columns (TTRs):

#Apply the rules to each data.frame
data.by.year <- lapply(uniqueYear, function(year){
  rule_subset <- rulesList[[as.character(year)]]
  data_subset <- dat[dat$Year==year, rule_subset]
})
names(data.by.year) <- uniqueYear

data.by.year 是一个列表(长度为 6),其中每个元素代表 1 年的数据,以及选定的 TTR.

The data.by.year is a list (of length 6), where each element represent 1 year's worth of data, with the selected TTRs.

> str(data.by.year[[1]])
An ‘xts’ object from 2007-02-08 to 2007-12-31 containing:
  Data: num [1:226, 1:8] 40.1 26.7 42.5 67.9 62.5 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:8] "RSI_5" "RSI_10" "RSI_20" "RSI_25" ...
  Indexed by objects of class: [Date] TZ: 
  xts Attributes:  
List of 3
 $ tclass   : chr "Date"
 $ tzone    : chr ""
 $ na.action:Class 'omit'  atomic [1:25] 1 2 3 4 5 6 7 8 9 10 ...
  .. ..- attr(*, "index")= num [1:25] 1.17e+09 1.17e+09 1.17e+09 1.17e+09 1.17e+09 ...
> 

这篇关于R 特殊数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆