根据多个键列将丢失的行添加到data.table [英] Add missing rows to data.table according to multiple keyed columns

查看:79
本文介绍了根据多个键列将丢失的行添加到data.table的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个data.table对象,该对象包含指定特定情况的多列.在下面的小示例中,变量"name","job"和"sex"指定唯一的ID.我想添加缺少的行,以使每种情况下的每个变量"from"(类似于expand.grid)的每个可能实例都有一行.

I have a data.table object that contains multiple columns that specify unique cases. In the small example below, the variables "name", "job", and "sex" specify the unique IDs. I would like to add missing rows so that each each case has a row for each possible instance of another variable, "from" (similar to expand.grid).

library(data.table)
set.seed(1)
mydata <- data.table(name = c("john","john","john","john","mary","chris","chris","chris"),
                 job = c("teacher","teacher","teacher","teacher","police","lawyer","lawyer","doctor"),
                 sex = c("male","male","male","male","female","female","male","male"),
                 from = c("NYT","USAT","BG","TIME","USAT","BG","NYT","NYT"),
                 score = rnorm(8))

setkeyv(mydata, cols=c("name","job","sex"))

mydata[CJ(unique(name, job, sex), unique(from))]

这是当前的data.table对象:

Here's the current data.table object:

> mydata
    name     job    sex from      score
1:  john teacher   male  NYT -0.6264538
2:  john teacher   male USAT  0.1836433
3:  john teacher   male   BG -0.8356286
4:  john teacher   male TIME  1.5952808
5:  mary  police female USAT  0.3295078
6: chris  lawyer female   BG -0.8204684
7: chris  lawyer   male  NYT  0.4874291
8: chris  doctor   male  NYT  0.7383247

这是我想要的结果:

> mydata
     name     job    sex from      score
1:   john teacher   male  NYT -0.6264538
2:   john teacher   male USAT  0.1836433
3:   john teacher   male   BG -0.8356286
4:   john teacher   male TIME  1.5952808
5:   mary  police female  NYT  NA
6:   mary  police female USAT  0.3295078
7:   mary  police female   BG  NA
8:   mary  police female TIME  NA
9:  chris  lawyer female  NYT -NA
10: chris  lawyer female USAT -NA
11: chris  lawyer female   BG -0.8204684
12: chris  lawyer female TIME -NA
13: chris  lawyer   male  NYT  0.4874291
14: chris  lawyer   male USAT  NA
15: chris  lawyer   male   BG  NA
16: chris  lawyer   male TIME  NA
17: chris  doctor   male  NYT  0.7383247
18: chris  doctor   male USAT  NA
19: chris  doctor   male   BG  NA
20: chris  doctor   male TIME  NA

这是我尝试过的:

setkeyv(mydata, cols=c("name","job","sex"))
mydata[CJ(unique(name, job, sex), unique(from))]

但是我收到以下错误,并且添加fromLast = TRUE(或FALSE)不能给我正确的解决方案:

But I receive the following error and adding fromLast=TRUE (or FALSE) does not give me the right solution:

Error in unique.default(name, job, sex) : 
  'fromLast' must be TRUE or FALSE

以下是我遇到的相关答案(但似乎没有一个可以处理多个键列): 将丢失的行添加到数据表中

Here are the relevant answers I've come across (but none appears to deal with multiple keyed columns): add missing rows to a data table

在data.table中有效地插入默认的缺失行

最快添加行的方法缺少data.frame中的值?

推荐答案

这里有两种可能- 查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆