在data.table中用0填充丢失的组数据 [英] fill missing group data with 0s in a data.table
问题描述
这不是此。这个问题处理的是已经有NA的行,我的问题处理的应该是数据点为0的缺失行。
This isn't a dupe of this. That question deals with rows which already have NAs in them, my question deals with missing rows for which there should be a data point of 0.
假设我有此数据.table
Let's say I have this data.table
dt<-data.table(id=c(1,2,4,5,6,1,3,4,5,6),
varname=c(rep('banana',5),rep('apple',5)),
thedata=runif(10,1,10))
对于每个变量$ c $,最好的添加方式是什么c>,缺少的
id
s对于 thedata
是0?
What's the best way to add, for each varname
, the missing id
s with a 0 for thedata
?
此刻,我用 fill = 0
广播,然后再次融化,但这似乎不是很有效。
At the moment I dcast with fill=0
and then melt again but this doesn't seem very efficient.
melt(dcast.data.table(dt,id~varname,value.var='thedata',fill=0),id.var='id',variable.factor=FALSE,variable.name='varname',value.name='thedata')
我也只是想这样做,但是最后填写NA有点笨拙
I also just thought of doing it this way but it gets a little clunky to fill in NAs at the end
merge(dt[,CJ(id=unique(id),varname=unique(varname))],dt,by=c('varname','id'),all=TRUE)[,.(varname,id,thedata=ifelse(!is.na(thedata),thedata,0))]
在此示例中,我只使用了一个 id
列,但任何其他建议都应可扩展为具有多个 id
列。
In this example, I only used one id
column but any additional suggestion should be extensible to having more than one id
column.
编辑
我做了 system.time
在具有较大数据集的每种方法上使用,熔化/浇铸方法花费2-3秒,而merge / CJ方法花费12-13秒。
I did a system.time
on each approach with a largish data set and the melt/cast approach took between 2-3 seconds while the merge/CJ approach took between 12-13.
EDIT2
Roland的CJ方法比我的要好得多因为我的数据集只用了4-5秒。
Roland's CJ approach is much better than mine as it only took between 4-5 seconds with my dataset.
有更好的方法吗?
推荐答案
setkey(dt, varname, id)
dt[CJ(unique(varname), unique(id))]
# id varname thedata
# 1: 1 apple 9.083738
# 2: 2 apple NA
# 3: 3 apple 7.332652
# 4: 4 apple 3.610315
# 5: 5 apple 7.113414
# 6: 6 apple 9.046398
# 7: 1 banana 3.973751
# 8: 2 banana 9.907012
# 9: 3 banana NA
#10: 4 banana 9.308346
#11: 5 banana 1.572314
#12: 6 banana 7.753611
然后根据需要(通常不适合)用 0
代替 NA
。
Then substitute NA
with 0
if you must (usually not appropriate).
这篇关于在data.table中用0填充丢失的组数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!