在data.table中用0填充丢失的组数据 [英] fill missing group data with 0s in a data.table

查看：72 发布时间：2020/10/15 21:07:50 r data.table

本文介绍了在data.table中用0填充丢失的组数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这不是此。这个问题处理的是已经有NA的行，我的问题处理的应该是数据点为0的缺失行。

This isn't a dupe of this. That question deals with rows which already have NAs in them, my question deals with missing rows for which there should be a data point of 0.

假设我有此数据.table

Let's say I have this data.table

dt<-data.table(id=c(1,2,4,5,6,1,3,4,5,6),
           varname=c(rep('banana',5),rep('apple',5)),
            thedata=runif(10,1,10))

对于每个变量，缺少的 id s对于 thedata 是0？


What's the best way to add, for each varname, the missing ids with a 0 for thedata?
此刻，我用 fill = 0 广播，然后再次融化，但这似乎不是很有效。 
At the moment I dcast with fill=0 and then melt again but this doesn't seem very efficient.  
melt(dcast.data.table(dt,id~varname,value.var='thedata',fill=0),id.var='id',variable.factor=FALSE,variable.name='varname',value.name='thedata')

我也只是想这样做，但是最后填写NA有点笨拙
I also just thought of doing it this way but it gets a little clunky to fill in NAs at the end
merge(dt[,CJ(id=unique(id),varname=unique(varname))],dt,by=c('varname','id'),all=TRUE)[,.(varname,id,thedata=ifelse(!is.na(thedata),thedata,0))]

在此示例中，我只使用了一个 id 列，但任何其他建议都应可扩展为具有多个 id 列。
In this example, I only used one id column but any additional suggestion should be extensible to having more than one id column.
 编辑 
我做了 system.time 在具有较大数据集的每种方法上使用，熔化/浇铸方法花费2-3秒，而merge / CJ方法花费12-13秒。
I did a system.time on each approach with a largish data set and the melt/cast approach took between 2-3 seconds while the merge/CJ approach took between 12-13.
  EDIT2  
 Roland的CJ方法比我的要好得多因为我的数据集只用了4-5秒。
Roland's CJ approach is much better than mine as it only took between 4-5 seconds with my dataset.
有更好的方法吗？
推荐答案
 
setkey(dt, varname, id)
dt[CJ(unique(varname), unique(id))]
#    id varname  thedata
# 1:  1   apple 9.083738
# 2:  2   apple       NA
# 3:  3   apple 7.332652
# 4:  4   apple 3.610315
# 5:  5   apple 7.113414
# 6:  6   apple 9.046398
# 7:  1  banana 3.973751
# 8:  2  banana 9.907012
# 9:  3  banana       NA
#10:  4  banana 9.308346
#11:  5  banana 1.572314
#12:  6  banana 7.753611

然后根据需要（通常不适合）用 0 代替 NA 。
Then substitute NA with 0 if you must (usually not appropriate).

                        这篇关于在data.table中用0填充丢失的组数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

在data.table中用0填充丢失的组数据 [英] fill missing group data with 0s in a data.table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在data.table中用0填充丢失的组数据 [英] fill missing group data with 0s in a data.table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭