R:在数据帧列循环中使用ddply [英] R: using ddply in a loop over data frame columns
问题描述
我需要根据数据框中列的子集中每个列中的值来计算并向数据框中添加多个新列.这些列均保存时间序列数据(有一个公共日期列).例如,我需要为上一列计算上一年同月的更改.我可以指定它们并分别进行计算,但是由于要转换的列很多,因此变得很繁琐,因此我尝试使用for循环来自动化该过程.
我一直做得很好,直到尝试使用ddply
为到目前为止的年度总值创建一列.发生的情况是ddply
在循环的每次迭代过程中都添加了新行,并将这些新行包括在cumsum
计算中.我有两个问题.
require(lubridate)
require(plyr)
require(xts)
set.seed(12345)
# create dummy time series data
monthsback <- 24
startdate <- as.Date(paste(year(now()),month(now()),"1",sep = "-")) - months(monthsback)
mydf <- data.frame(mydate = seq(as.Date(startdate), by = "month", length.out = monthsback),
myvalue1 = runif(monthsback, min = 600, max = 800),
myvalue2 = runif(monthsback, min = 200, max = 300))
mydf$year <- as.numeric(format(as.Date(mydf$mydate), format="%Y"))
mydf$month <- as.numeric(format(as.Date(mydf$mydate), format="%m"))
newcolnames <- c('myvalue1','myvalue2')
for (i in seq_along(newcolnames)) {
print(newcolnames[i])
mydf$myxts <- xts(mydf[, newcolnames[i]], order.by = mydf$mydate)
## Calculate change over same month in previous year
mylag <- 12
mydf[, paste(newcolnames[i], "_yoy", sep = "", collapse = "")] <- as.numeric(diff(mydf$myxts, lag = mylag)/ lag(mydf$myxts, mylag))
## Calculate change over previous month
mylag <- 1
mydf[, paste(newcolnames[i], "_mom", sep = "", collapse = "")] <- as.numeric(diff(mydf$myxts, lag = mylag)/ lag(mydf$myxts, mylag))
## Calculate cumulative figure
#mydf$newcol <- as.numeric(mydf$myxts)
mydf$newcol <- 1
mydf <- ddply(mydf, .(year), transform, newcol = cumsum(as.numeric(mydf$myxts)))
colnames(mydf)[colnames(mydf)=="newcol"] <- paste(newcolnames[i], "_cuml", sep = "", collapse = "")
}
mydf
在您的循环中,由于myxts
不是数据框架的一部分,因此它不会在ddply
语句中与其他所有内容一起拆分.更改为:
mydf$myxts <- xts(mydf[, newcolnames[i]], order.by = mydf$mydate)
我不知道如何在transform
中使用动态生成的名称.
I need to calculate and add to a data frame multiple new columns based on the values in each column in a subset of columns in the data frame. These columns all hold time series data (there is a common date column). For example I need to calculate the change for the same month in the previous year for a dozen columns. I could specify them and calculate them individually but that becomes onerous with a large number of columns to transform, so I am trying to automate the process with a for loop.
I was doing OK until I tried to use ddply
to create a column for the running total of the value for the year so far. What happens is that ddply
is adding new rows during each iteration through the loop and including those new rows in the cumsum
calculation. I have two questions.
Q. How can I get ddply to calculate the correct cumsum? Q. How can I specify the name of the column during the ddply call, rather than using a dummy value and renaming afterward?
[Edit: I spoke too soon, the updated code below does NOT work at this point, just FYI]
require(lubridate)
require(plyr)
require(xts)
set.seed(12345)
# create dummy time series data
monthsback <- 24
startdate <- as.Date(paste(year(now()),month(now()),"1",sep = "-")) - months(monthsback)
mydf <- data.frame(mydate = seq(as.Date(startdate), by = "month", length.out = monthsback),
myvalue1 = runif(monthsback, min = 600, max = 800),
myvalue2 = runif(monthsback, min = 200, max = 300))
mydf$year <- as.numeric(format(as.Date(mydf$mydate), format="%Y"))
mydf$month <- as.numeric(format(as.Date(mydf$mydate), format="%m"))
newcolnames <- c('myvalue1','myvalue2')
for (i in seq_along(newcolnames)) {
print(newcolnames[i])
mydf$myxts <- xts(mydf[, newcolnames[i]], order.by = mydf$mydate)
## Calculate change over same month in previous year
mylag <- 12
mydf[, paste(newcolnames[i], "_yoy", sep = "", collapse = "")] <- as.numeric(diff(mydf$myxts, lag = mylag)/ lag(mydf$myxts, mylag))
## Calculate change over previous month
mylag <- 1
mydf[, paste(newcolnames[i], "_mom", sep = "", collapse = "")] <- as.numeric(diff(mydf$myxts, lag = mylag)/ lag(mydf$myxts, mylag))
## Calculate cumulative figure
#mydf$newcol <- as.numeric(mydf$myxts)
mydf$newcol <- 1
mydf <- ddply(mydf, .(year), transform, newcol = cumsum(as.numeric(mydf$myxts)))
colnames(mydf)[colnames(mydf)=="newcol"] <- paste(newcolnames[i], "_cuml", sep = "", collapse = "")
}
mydf
In your loop, since myxts
is not part of the data frame, it is not split up in the ddply
statement along with everything else. Change it to:
mydf$myxts <- xts(mydf[, newcolnames[i]], order.by = mydf$mydate)
I don't know of any way to use dynamically generated names with transform
.
这篇关于R:在数据帧列循环中使用ddply的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!