如何检测和重新插入缺失的数据? [英] How do I detect and re-insert missing data?
本文介绍了如何检测和重新插入缺失的数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在数据表中缺少一行,描述时间
, sid
和 sc
到 count
:
I have a missing row in a data table which describes a function from time
, sid
, and s.c
to count
:
> dates.dt[1001:1011]
sid s.c count time
1: missing CLICK 104192 2013-05-25 10:00:00
2: missing SHARE 7694 2013-05-25 10:00:00
3: present CLICK 99573 2013-05-25 10:00:00
4: present SHARE 89302 2013-05-25 10:00:00
5: missing CLICK 28 2013-05-25 11:00:00
6: present CLICK 25 2013-05-25 11:00:00
7: present SHARE 15 2013-05-25 11:00:00
8: missing CLICK 104544 2013-05-25 12:00:00
9: missing SHARE 7253 2013-05-25 12:00:00
10: present CLICK 105891 2013-05-25 12:00:00
11: present SHARE 88709 2013-05-25 12:00:00
缺少的行是(我预期第一列和第二列以及每个时间片的两个值中的每一行都有一行):
the missing row is (I expect a row for each of the two values of the 1st and 2nd columns and each time slice):
missing SHARE 0 2013-05-25 11:00:00
我如何检测并恢复这样缺失的行?
How do I detect and restore such missing rows?
我发现这是
library(data.table)
total <- dates.dt[, list(sum(count)) , keyby="time"]
setnames(total,"V1","total")
ts <- dates.dt[s.c=="SHARE" & sid=="missing", list(sum(count)) , keyby="time"]
cat("SHARE/missing:",nrow(ts),"rows\n")
stopifnot(identical(total$time,ts$time)) # --> ERROR!
total$shares.missing <- ts$V1
找到 ts $ time
和 total $ time
不同并插入0行的第一个位置,但是这似乎是一个相当乏味的
过程。
Now, I guess I can find the first place where ts$time
and total$time
differ and insert a 0 row there, but this seems like a rather tedious
process.
谢谢!
推荐答案
按照Frank的建议,您可以这样做:
Following @Frank's suggestion you can do:
setkey(dt, time, sid, s.c)
dt[J(expand.grid(unique(time),unique(sid),unique(s.c)))][order(time, sid, s.c)]
# time sid s.c count
# 1: 2013-05-25 10:00:00 missing CLICK 104192
# 2: 2013-05-25 10:00:00 missing SHARE 7694
# 3: 2013-05-25 10:00:00 present CLICK 99573
# 4: 2013-05-25 10:00:00 present SHARE 89302
# 5: 2013-05-25 11:00:00 missing CLICK 28
# 6: 2013-05-25 11:00:00 missing SHARE NA
# 7: 2013-05-25 11:00:00 present CLICK 25
# 8: 2013-05-25 11:00:00 present SHARE 15
# 9: 2013-05-25 12:00:00 missing CLICK 104544
#10: 2013-05-25 12:00:00 missing SHARE 7253
#11: 2013-05-25 12:00:00 present CLICK 105891
#12: 2013-05-25 12:00:00 present SHARE 88709
这篇关于如何检测和重新插入缺失的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文