如何避免循环 [英] how to avoid loops
问题描述
大家好, 我是R的新手.
我有两个面板数据文件,其列为"id","date"和"ret"
文件A比文件B具有更多的数据, 但我主要是处理文件B数据.
"id"和"date"的组合是不正确的标识符.
是否有一种查找B中每个(id,日期)的优雅方法,我需要从文件A中获取过去10天的信息,然后将它们存储回B中?
我天真的做法是循环遍历B中的所有行,
for i in 1:length(B) {
B$past10d[i] <- prod(1+A$ret[which(A$id == B$id[i] & A$date > B$date[i]-10 & A$date < B$date[i])])-1
}
,但是循环需要永远的时间.
真的很感谢您的想法.
非常感谢您.
我认为关键是矢量化并使用%in%
运算符对数据帧A
进行子集化.而且,我知道价格不是随机数,但我不想编写随机游标...我使用paste
创建了一个股票-日期索引,但是我确定您可以使用plm
库中,这是我发现的有关面板数据的最佳记录.A <- data.frame(stock=rep(1:10, each=100), date=rep(Sys.Date()-99:0, 10), price=rnorm(1000))
B <- A[seq(from=100, to=1000, by=100), ]
A <- cbind(paste(A$stock, A$date, sep="-"), A)
B <- cbind(paste(B$stock, B$date, sep="-"), B)
colnames(A) <- colnames(B) <- c("index", "stock", "date", "price")
index <- which(A[, 1] %in% B[, 1])
returns <- (A$price[index] - A$price[index-10]) / A$price[index-10]
B <- cbind(B, returns)
HI All, I'm new to R.
I have two panel data files, with columns "id", "date" and "ret"
file A has a lot more data than file B, but i'm primarily working with file B data.
Combination of "id" and "date" is unqiue indentifier.
Is there an elegent way of looking up for each (id, date) in B, I need to get the past 10 days ret from file A, and store them back into B?
my naive way of doing it is to loop for all rows in B,
for i in 1:length(B) {
B$past10d[i] <- prod(1+A$ret[which(A$id == B$id[i] & A$date > B$date[i]-10 & A$date < B$date[i])])-1
}
but the loops takes forever.
Really appreciate your thoughts.
Thank you very much.
I think the key is to vectorize and use the %in%
operator to subset data frame A
. And, I know, prices are not random numbers, but I didn't want to code a random walk... I created a stock-date index using paste
, but I'm sure you could use the index from pdata.frame
in the plm
library, which is the best I've found for panel data.
A <- data.frame(stock=rep(1:10, each=100), date=rep(Sys.Date()-99:0, 10), price=rnorm(1000))
B <- A[seq(from=100, to=1000, by=100), ]
A <- cbind(paste(A$stock, A$date, sep="-"), A)
B <- cbind(paste(B$stock, B$date, sep="-"), B)
colnames(A) <- colnames(B) <- c("index", "stock", "date", "price")
index <- which(A[, 1] %in% B[, 1])
returns <- (A$price[index] - A$price[index-10]) / A$price[index-10]
B <- cbind(B, returns)
这篇关于如何避免循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!