平衡(为每个个体创建相同数量的行)数据 [英] Balancing (creating same number of rows for each individual) data

查看:91
本文介绍了平衡(为每个个体创建相同数量的行)数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个data.table如下, id1 是主体级ID, id2 主题重复测量ID, X 是其中有许多的数据变量。我想平衡数据,使每个人都有相同的行数(重复测量),即 max(DT [,.N,by = id1] [,N]),但根据需要调整 id1 id2 / code>为这些新行替换为 NA

Given a data.table as follows, id1 is a subject-level ID, id2 is a within-subject repeated-measure ID, X are data variables of which there are many. I want to balance the data such that every individual has the same number of rows (repeated measures), which is the max(DT[,.N,by=id1][,N]), but where id1 and id2 are adjusted as necessary, and X data values are replaced with NA for these new rows.

以下:

DT = data.table(
id1 = c(1,1,2,2,2,3,3,3,3),
id2 = c(1,2,1,2,3,1,2,3,4),
X1 = letters[1:9],
X2 = LETTERS[1:9]
)
setkey(DT,id1)

应如下所示:

DT = data.table(
id1 = c(1,1,1,1,2,2,2,2,3,3,3,3),
id2 = c(1,2,3,4,1,2,3,4,1,2,3,4),
X1 = c(letters[1:2],NA,NA,letters[3:5],NA,letters[6:9]),
X2 = c(LETTERS[1:2],NA,NA,LETTERS[3:5],NA,LETTERS[6:9])
)

如何使用 data.table ?要避免循环,因为这个数据集是巨大的。这是 reshape2 的工作吗?

How do you go about doing this using data.table? For-looping to be avoided as this data-set is huge. Is this a job for reshape2?

推荐答案

您可以尝试:

 DT2 <- CJ(id1=1:3, id2=1:4)
 merge(DT,DT2, by=c('id1', 'id2'), all=TRUE)

这篇关于平衡(为每个个体创建相同数量的行)数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆