在data.table中用0填充丢失的组数据 [英] fill missing group data with 0s in a data.table

查看:72
本文介绍了在data.table中用0填充丢失的组数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这不是。这个问题处理的是已经有NA的行,我的问题处理的应该是数据点为0的缺失行。

This isn't a dupe of this. That question deals with rows which already have NAs in them, my question deals with missing rows for which there should be a data point of 0.

假设我有此数据.table

Let's say I have this data.table

dt<-data.table(id=c(1,2,4,5,6,1,3,4,5,6),
           varname=c(rep('banana',5),rep('apple',5)),
            thedata=runif(10,1,10))

对于每个变量,缺少的 id s对于 thedata 是0?

What's the best way to add, for each varname, the missing ids with a 0 for thedata?

此刻,我用 fill = 0 广播,然后再次融化,但这似乎不是很有效。

At the moment I dcast with fill=0 and then melt again but this doesn't seem very efficient.

melt(dcast.data.table(dt,id~varname,value.var='thedata',fill=0),id.var='id',variable.factor=FALSE,variable.name='varname',value.name='thedata')

我也只是想这样做,但是最后填写NA有点笨拙

I also just thought of doing it this way but it gets a little clunky to fill in NAs at the end

merge(dt[,CJ(id=unique(id),varname=unique(varname))],dt,by=c('varname','id'),all=TRUE)[,.(varname,id,thedata=ifelse(!is.na(thedata),thedata,0))]

在此示例中,我只使用了一个 id 列,但任何其他建议都应可扩展为具有多个 id 列。

In this example, I only used one id column but any additional suggestion should be extensible to having more than one id column.

编辑

我做了 system.time 在具有较大数据集的每种方法上使用,熔化/浇铸方法花费2-3秒,而merge / CJ方法花费12-13秒。

I did a system.time on each approach with a largish data set and the melt/cast approach took between 2-3 seconds while the merge/CJ approach took between 12-13.

EDIT2

Roland的CJ方法比我的要好得多因为我的数据集只用了4-5秒。

Roland's CJ approach is much better than mine as it only took between 4-5 seconds with my dataset.

有更好的方法吗?

推荐答案

setkey(dt, varname, id)
dt[CJ(unique(varname), unique(id))]
#    id varname  thedata
# 1:  1   apple 9.083738
# 2:  2   apple       NA
# 3:  3   apple 7.332652
# 4:  4   apple 3.610315
# 5:  5   apple 7.113414
# 6:  6   apple 9.046398
# 7:  1  banana 3.973751
# 8:  2  banana 9.907012
# 9:  3  banana       NA
#10:  4  banana 9.308346
#11:  5  banana 1.572314
#12:  6  banana 7.753611

然后根据需要(通常不适合)用 0 代替 NA

Then substitute NA with 0 if you must (usually not appropriate).

这篇关于在data.table中用0填充丢失的组数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆