填充没有循环的值 [英] Filling in values without a loop
问题描述
我有一个大的数据框架x,股票价格在特定日期。我想将这个数据集与日期变量合并,并填写最后一次已知的x维持,直到下一个spedific日期,以便我获得数据帧z。下面的例子显示了一个库存。
I have a large data frame x with stock prices on specific dates. I want to merge this data set with a date variable and fill in the last known obervation of x until the next spedific date so that I get data frame z. The example below shows this for one stock.
我正在使用一个循环,但是这个过程非常慢,因为我有五到十年的日常数据和数以千计的股票。
I am using a loop but the process is very slow as I have five to ten years of daily data and thousands of stocks.
有另外一种方法吗?在Matlab中,相同的代码运行得更快。
Is there an alternative way? In Matlab, the same code runs much faster.
重要的是我也可以使用替代条件,而不是简单的is.na(z [t,2] ==
Important would be that I can also use alternative conditions than the simple is.na(z[t,2]==TRUE condition.
以下是示例:
> x=data.frame(c("2015-05-31","2015-06-30","2015-07-31"),c(100,200,150))
> colnames(x)=c("Date","AAPL")
> x[,1]=as.Date(x[,1],origin="1970-01-01")
>
> x
Date AAPL
1 2015-05-31 100
2 2015-06-30 200
3 2015-07-31 150
>
> date=data.frame(c("2015-05-31","2015-06-01","2015-06-02","2015-06-03","2015-06-04","2015-06-05","2015-06-06","2015-06-07","2015-06-08","2015-06-09","2015-06-10","2015-06-11","2015-06-12","2015-06-13","2015-06-14","2015-06-15","2015-06-16","2015-06-17","2015-06-18","2015-06-19","2015-06-20","2015-06-21","2015-06-22","2015-06-23","2015-06-24","2015-06-25","2015-06-26","2015-06-27","2015-06-28","2015-06-29","2015-06-30","2015-07-01","2015-07-02","2015-07-03","2015-07-04","2015-07-05","2015-07-06","2015-07-07","2015-07-08","2015-07-09","2015-07-10","2015-07-11","2015-07-12","2015-07-13","2015-07-14","2015-07-15","2015-07-16","2015-07-17","2015-07-18","2015-07-19","2015-07-20","2015-07-21","2015-07-22","2015-07-23","2015-07-24","2015-07-25","2015-07-26","2015-07-27","2015-07-28","2015-07-29","2015-07-30","2015-07-31"))
> colnames(date)=c("Date")
> date[,1]=as.Date(date[,1],origin="1970-01-01")
>
> date
Date
1 2015-05-31
2 2015-06-01
3 2015-06-02
29 ...
30 2015-06-29
31 2015-06-30
32 2015-07-01
33 2015-07-02
>
> z=merge(x=x, y=date, by.x="Date", by.y="Date",all.y=TRUE)
>
>
> #Converting x to a data matrix speeds up the loop
> z=data.matrix(z)
>
> for (t in 1:nrow(z)) {
+ if (is.na(z[t,2]==TRUE)){
+ z[t,2]=z[t-1,2]
+ } else if (is.na(z[t,2]==TRUE)){
+ z[t,2]=z[t,2]
+ }
+ }
>
> z=as.data.frame(z)
> z[,1]=as.Date(z[,1],origin="1970-01-01")
>
> z
Date AAPL
1 2015-05-31 100
2 2015-06-01 100
3 2015-06-02 100
29 ...
30 2015-06-29 100
31 2015-06-30 200
32 2015-07-01 200
33 2015-07-02 200
推荐答案
我们可以使用 base R
做这个。我们得到非NA'AAPL'元素('i1'), cumsum
'i1'的逻辑索引转换为 numeric
index,使用它替换非NA元素的 NA
元素。
We could use base R
to do this. We get the logical index of non-NA 'AAPL' elements ('i1'), cumsum
the 'i1' to convert to numeric
index, use that to replace the NA
elements with non-NA elements.
i1 <- !is.na(z$AAPL)
z$AAPL <- z$AAPL[i1][cumsum(i1)]
head(z)
# Date AAPL
#1 2015-05-31 100
#2 2015-06-01 100
#3 2015-06-02 100
#4 2015-06-03 100
#5 2015-06-04 100
#6 2015-06-05 100
tail(z)
# Date AAPL
#57 2015-07-26 200
#58 2015-07-27 200
#59 2015-07-28 200
#60 2015-07-29 200
#61 2015-07-30 200
#62 2015-07-31 150
这篇关于填充没有循环的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!