填写缺少的日期并填写上面的数据 [英] Fill in missing date and fill with the data above
问题描述
我已经进行了足够的研究,直到在这里问到这个为止,但是您能为我提供一些有关此问题的想法吗?
I've researched enough until i ask this here but can you please help me with some ideas for this issue?
我的数据表(df)如下:
My data table (df) looks like this:
client id value repmonth
123 100 2012-01-31
123 200 2012-02-31
123 300 2012-05-31
因此,我有2个失踪月份。我希望我的数据表看起来像这样:
Therefore I have 2 missing months. And i want my data table to look like this:
client id value repmonth
123 100 2012-01-31
123 200 2012-02-31
123 200 2012-03-31
123 200 2012-04-31
123 300 2012-05-31
代码应填写缺少的repmonth并用最后一个值填充行,在这种情况下为200,并附带客户ID 。
The code should be filling in the missing repmonth and fill the rows with the last value, in this case 200 and the came client id.
我尝试了以下操作:
zoo library
tidyr library
dlpyr library
posixct
至于代码:...大量失败
As for codes: ...plenty of fails
library(tidyr)
df %>%
mutate (repmonth = as.Date(repmonth)) %>%
complete(repmonth = seq.Date(min(repmonth), max(repmonth),by ="month"))
或
library(dplyr)
df$reportingDate.end.month <- as.POSIXct(df$datetime, tz = "GMT")
df <- tbl_df(df)
list_df <- list(df, df) # fake list of data.frames
seq_df <- data_frame(datetime = seq.POSIXt(as.POSIXct("2012-01-31"),
as.POSIXct("2018-12-31"),
by="month"))
lapply(list_df, function(x){full_join(total_loan_portfolios_3$reportingDate.end.month, seq_df, by=reportingDate.end.month)})
total_loan_portfolios_3$reportingmonth_notmissing <- full_join(seq_df,total_loan_portfolios_3$reportingDate.end.month)
或
library(dplyr)
ts <- seq.POSIXt(as.POSIXct("2012-01-01",'%d/%m/%Y'), as.POSIXct("2018/12/01",'%d/%m/%Y'), by="month")
ts <- seq.POSIXt(as.POSIXlt("2012-01-01"), as.POSIXlt("2018-12-01"), by="month")
ts <- format.POSIXct(ts,'%d/%m/%Y')
df <- data.frame(timestamp=ts)
total_loan_portfolios_3 <- full_join(df,total_loan_portfolios_3$Reporting_date)
最后,我有很多错误,例如
Finally, I have plenty of errors like
格式不是日期
the format is not date
或
seq.int(r1 $ mon,12 *(to0 $ year-r1 $ year)中的错误+ to0 $ mon,by):
'from'必须为有限数字
Error in seq.int(r1$mon, 12 * (to0$year - r1$year) + to0$mon, by) :
'from' must be a finite number
等。
推荐答案
以下解决方案使用lubridate和tidyr软件包。请注意,在OP示例中,日期格式不正确,但是意味着具有月末一天输入的数据,因此请尝试在此处复制。解决方案创建从最小输入日期到最大输入日期的日期序列,以获取所有可能的关注月份。请注意,输入日期已标准化为每月的第一天,以确保正确生成序列。创建序列后,将进行左联接合并以合并我们拥有的数据并识别丢失的数据。然后将fill()应用于列以填充缺少的NA。
The following solution uses lubridate and tidyr packages. Note that in OP example, dates are malformed, but implies having data with last-day-of-month input, so tried to replicate it here. Solution creates a sequence of dates from min input date to max input date to get all possible months of interest. Note that input dates are normalized to first-day-of-month to ensure proper sequence generation. With the sequence created, a left-join merge is done to merge data we have and identify missing data. Then fill() is applied to columns to fill in the missing NAs.
library(lubridate)
library(tidyr)
#Note OP has month of Feb with 31 days... Corrected to 28 but this fails to parse as a date
df <- data.frame(client_id=c(123,123,123),value=c(100,200,300),repmonth=c("2012-01-31","2012-02-29","2012-05-31"),stringsAsFactors = F)
df$repmonth <- ymd(df$repmonth) #convert character dates to Dates
start_month <- min(df$repmonth)
start_month <- start_month - days(day(start_month)-1) #first day of month to so seq.Date sequences properly
all_dates <- seq.Date(from=start_month,to=max(df$repmonth),by="1 month")
all_dates <- (all_dates %m+% months(1)) - days(1) #all end-of-month-day since OP suggests having last-day-of-month input?
all_dates <- data.frame(repmonth=all_dates)
df<-merge(x=all_dates,y=df,by="repmonth",all.x=T)
df <- fill(df,c("client_id","value"))
Solution yields:
> df
repmonth client_id value
1 2012-01-31 123 100
2 2012-02-29 123 200
3 2012-03-31 123 200
4 2012-04-30 123 200
5 2012-05-31 123 300
这篇关于填写缺少的日期并填写上面的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!