根据多个键列将丢失的行添加到data.table [英] Add missing rows to data.table according to multiple keyed columns
问题描述
我有一个data.table
对象,该对象包含指定特定情况的多列.在下面的小示例中,变量"name
","job
"和"sex
"指定唯一的ID.我想添加缺少的行,以使每种情况下的每个变量"from
"(类似于expand.grid
)的每个可能实例都有一行.
I have a data.table
object that contains multiple columns that specify unique cases. In the small example below, the variables "name
", "job
", and "sex
" specify the unique IDs. I would like to add missing rows so that each each case has a row for each possible instance of another variable, "from
" (similar to expand.grid
).
library(data.table)
set.seed(1)
mydata <- data.table(name = c("john","john","john","john","mary","chris","chris","chris"),
job = c("teacher","teacher","teacher","teacher","police","lawyer","lawyer","doctor"),
sex = c("male","male","male","male","female","female","male","male"),
from = c("NYT","USAT","BG","TIME","USAT","BG","NYT","NYT"),
score = rnorm(8))
setkeyv(mydata, cols=c("name","job","sex"))
mydata[CJ(unique(name, job, sex), unique(from))]
这是当前的data.table对象:
Here's the current data.table object:
> mydata
name job sex from score
1: john teacher male NYT -0.6264538
2: john teacher male USAT 0.1836433
3: john teacher male BG -0.8356286
4: john teacher male TIME 1.5952808
5: mary police female USAT 0.3295078
6: chris lawyer female BG -0.8204684
7: chris lawyer male NYT 0.4874291
8: chris doctor male NYT 0.7383247
这是我想要的结果:
> mydata
name job sex from score
1: john teacher male NYT -0.6264538
2: john teacher male USAT 0.1836433
3: john teacher male BG -0.8356286
4: john teacher male TIME 1.5952808
5: mary police female NYT NA
6: mary police female USAT 0.3295078
7: mary police female BG NA
8: mary police female TIME NA
9: chris lawyer female NYT -NA
10: chris lawyer female USAT -NA
11: chris lawyer female BG -0.8204684
12: chris lawyer female TIME -NA
13: chris lawyer male NYT 0.4874291
14: chris lawyer male USAT NA
15: chris lawyer male BG NA
16: chris lawyer male TIME NA
17: chris doctor male NYT 0.7383247
18: chris doctor male USAT NA
19: chris doctor male BG NA
20: chris doctor male TIME NA
这是我尝试过的:
setkeyv(mydata, cols=c("name","job","sex"))
mydata[CJ(unique(name, job, sex), unique(from))]
但是我收到以下错误,并且添加fromLast = TRUE(或FALSE)不能给我正确的解决方案:
But I receive the following error and adding fromLast=TRUE (or FALSE) does not give me the right solution:
Error in unique.default(name, job, sex) :
'fromLast' must be TRUE or FALSE
以下是我遇到的相关答案(但似乎没有一个可以处理多个键列): 将丢失的行添加到数据表中
Here are the relevant answers I've come across (but none appears to deal with multiple keyed columns): add missing rows to a data table
推荐答案
这里有两种可能- 查看全文