R根据data.frame中的两列创建一个时间序列作为xts索引 [英] R Create a time sequence as xts index based on two columns in data.frame
问题描述
我有一个如下所示的数据帧
I have a data.frame like below
soc_sec group_count total_creds group_start group_end
(chr) (int) (dbl) (date) (date)
1 AA2105480 5 14.0 2005-01-09 2005-05-16
2 AA2105480 7 17.0 2004-08-26 2004-12-10
3 AB4378973 1 0.0 2004-01-21 2004-05-07
4 AB4990257 2 1.0 2014-09-01 2014-12-14
5 AB7777777 5 12.0 2004-01-21 2005-03-22
6 AB7777777 6 15.0 2004-08-26 2004-12-10
7 AB7777777 5 15.0 2005-01-09 2005-05-12
8 AC4285291 2 3.0 2014-09-01 2014-12-14
9 AC4285291 1 3.0 2015-01-12 2015-04-15
10 AC6039874 9 17.5 2010-01-06 2010-05-06
11 AC6039874 7 16.0 2011-01-05 2011-04-29
12 AC6039874 8 12.5 2010-08-31 2010-12-21
13 AC6039874 7 13.5 2011-08-31 2011-12-21
14 AC6547645 7 18.0 2005-01-09 2005-05-12
15 AC6547645 6 17.0 2004-08-26 2004-12-10
16 AC6547645 1 2.0 2005-04-20 2005-06-01
17 AD1418577 7 13.0 2013-01-09 2013-05-17
18 AD1418577 8 16.0 2013-08-28 2013-12-13
19 AD1418577 6 15.0 2014-01-08 2014-05-05
20 AD1418577 7 13.0 2015-08-26 2015-12-15
我要做的是创建一列,以后可以根据天数顺序将其用作xts对象的每日索引在 group_start
和 group_end
之间。我知道我能够使用 v<-seq(df $ group_start [1],df $ group_end [1],by = days)$ c计算一列向量$ c>我什至可以对行进行必要的重复,以后我可以
dplyr :: bind_rows(df,v)
用:
What I'm trying to do is create a column that I can later use as a day-by-day index for an xts object based on the sequence of days between group_start
and group_end
. I know I'm able to calculate a vector for one column using v <- seq(df$group_start[1], df$group_end[1], by="days")
I can even make the requisite repetition of the rows that I could later dplyr::bind_rows(df,v)
with:
df$len <- apply(df, 1, function(x){
length(seq(as.Date(x["group_start"]), as.Date(x["group_end"]), by="days"))
})
df <- df[rep(seq_len(nrow(df)), df$len),]
我无法做的是矢量化处理数据中的每一行。帧。
What I have been unable to do is vectorize this to occur for each row in the data.frame.
我尝试过的不起作用的事情
create_date_vector <- function(x){
flog.debug("id: %s", x["soc_sec"])
seq(as.Date(x["group_start"]), as.Date(x["group_end"]), by = "days")
}
date_vec <- c()
date_vec <- c(date_vec, apply(df, 1, create_date_vector))
错误,带有: seq错误.int(0,to0-from,by):错误输入'by'参数
date_vec <- c()
for(i in 1:nrow(df)){
date_vec <- c(date_vec, seq(from=as.Date(df$group_start[as.integer(i)]), to=as.Date(df$group_end[as.integer(i)])), by="days")
}
错误,带有: seq.Date(from = as.Date(ags_df $ group_start [as.integer(i)]))错误, to = as.Date(ags_df $ group_end [as.integer(i)])):
必须正确指定 to, by和 length.out / along.with中的两个
任何帮助都会非常感谢。谢谢。
Any help would be greatly appreciated. Thank you.
dput
structure(list(soc_sec = c("AA2105480", "AA2105480", "AB4378973",
"AB4990257", "AB7777777", "AB7777777", "AB7777777", "AC4285291",
"AC4285291", "AC6039874", "AC6039874", "AC6039874", "AC6039874",
"AC6547645", "AC6547645", "AC6547645", "AD1418577", "AD1418577",
"AD1418577", "AD1418577"), group_count = c(5L, 7L, 1L, 2L, 5L,
6L, 5L, 2L, 1L, 9L, 7L, 8L, 7L, 7L, 6L, 1L, 7L, 8L, 6L, 7L),
total_creds = c(14, 17, 0, 1, 12, 15, 15, 3, 3, 17.5, 16,
12.5, 13.5, 18, 17, 2, 13, 16, 15, 13), group_start = structure(c(12792,
12656, 12438, 16314, 12438, 12656, 12792, 16314, 16447, 14615,
14979, 14852, 15217, 12792, 12656, 12893, 15714, 15945, 16078,
16673), class = "Date"), group_end = structure(c(12919, 12762,
12545, 16418, 12864, 12762, 12915, 16418, 16540, 14735, 15093,
14964, 15329, 12915, 12762, 12935, 15842, 16052, 16195, 16784
), class = "Date")), class = c("tbl_df", "data.frame"), row.names = c(NA,
-20L), .Names = c("soc_sec", "group_count", "total_creds", "group_start",
"group_end"))
推荐答案
因此,我设法弄清楚了,我认为应该将解决方案放在这里,以防万一。它采取了多个步骤,所以如果有人可以想到一种更好的方法来完成此操作,请告诉我。
So, I managed to figure it out, and I figure I should put the solution down here just in case. It took multiple steps, so if any one can think of a better way to do this please let me know.
首先,我创建了一个列来计算天数在两个日期之间。我需要这样做,以便知道每行要重复多少次
First, I created a column to count the number of days between the 2 dates. I needed this so that I knew how many repetitions of each row to make
calc_day_nums <- function(x){
if(as.numeric(as.Date(x["group_start"])) < as.numeric(as.Date(x["group_end"]))){
len <- length(seq(as.Date(x["group_start"]), as.Date(x["group_end"]), by="days"))
} else if (as.numeric(as.Date(x["group_start"])) > as.numeric(as.Date(x["group_end"]))){
len <- length(seq(as.Date(x["group_end"]), as.Date(x["group_start"]), by="days"))
} else {
len <- 1 #basically these are records whose start and end are the same
}
return(len)
}
df$reps <- apply(df, 1, calc_day_nums)
然后,我创建了一个所有日子的向量
Then, I created a vector of all the days themselves
date_vec <- function(i, x, y){
if(as.Date(x[i]) != as.Date(y[i])){
as.Date(as.Date(x[i]):as.Date(y[i]), origin="1970-01-01")
} else{
as.Date(x[i])
}
}
vec <- lapply(seq_along(df$group_start), date_vec, x=df$group_start, y=df$group_end)
vec <- unlist(vec)
vec <- as.Date(vec)
之后,我对数据进行了正确的行重复次数.frame
After that, I made the correct number of row repetitions to the data.frame
df <- df[rep(seq_len(nrow(df)), df$reps),]
最后,我将向量绑定到data.frame。此时,我还可以将 vec
定义为xts索引 xt<-xts(x = df,order.by = vec)
,但我想将其添加到data.frame
Lastly, I bound the vector to the data.frame. At this point I could have also just defined the vec
as the xts index xt <- xts(x = df, order.by = vec)
, but I wanted to add it to the data.frame
df <- bind_cols(df, data.frame(days=vec))
这篇关于R根据data.frame中的两列创建一个时间序列作为xts索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!