在数据表中有效地插入缺省行 [英] Efficiently inserting default missing rows in a data.table
问题描述
假设我有以下 data.table
:
dt wday = c(mon,tue wed,thu,fri,sat,mon,tue,thu,fri),
val = c(2,3,5,8,6 ,2,3,4,2,6))
id wday val
1:1 mon 2
2:1 tue 3
3:1 wed 5
4:1 thu 8
5:1 fri 6
6:1 sat 2
7:2 mon 3
8:2 tue 4
9 :2 thu 2
10:2 fri 6
另一个 data.table
。它代表一个变量的计数( val
),取决于不同的个人( wday
c $ c> id )。问题是,在我的操作中,我已经失去了星期几,其中计数为0。
所以问题是:如何更新我的
val = 0
?
结果如下:
id wday val
1:1 mon 2
2:1 tue 3
3:1 wed 5
4:1 thu 8
5:1星期六6
6:1 sat 2
7:1 sun 0
8:2 mon 3
9:2 tue 4
10:2 wed 0
11:2 thu 2
12 :2 fri 6
13:2 sat 0
14:2 sun 0
感谢您的帮助。
我现在可以想到的一个简单的方法是使用 expand.grid
获取所有组合,然后将其用于子集 allow.cartesian = TRUE
:
setkey(dt,id,wday)
vals < - c(mon,tue,wed th,fri,sat,sun)
idx< - expand.grid(vals,unique(dt $ id))[,2:1] idx),allow.cartesian = TRUE]
#id wday val
#1:1 mon 2
#2:1 tue 3
#3:1 wed 5
#4:1 thu 8
#5:1 fri 6
#6:1 sat 2
#7:1 sun NA
#8:2 mon 3
#9:2 tue 4
#10:2 wed NA
#11:2 thu 2
#12:2 fri 6
#13:2 sat NA
#14:2 sun NA
或者, code> $ :
$ b <$ p $
code> dt [CJ(unique(dt $ id),vals),allow.cartesian = TRUE]
Suppose I've got the following data.table
:
dt <- data.table(id=c(1,1,1,1,1,1,2,2,2,2),
wday=c("mon","tue","wed","thu","fri","sat","mon","tue","thu","fri"),
val=c(2,3,5,8,6,2,3,4,2,6))
id wday val
1: 1 mon 2
2: 1 tue 3
3: 1 wed 5
4: 1 thu 8
5: 1 fri 6
6: 1 sat 2
7: 2 mon 3
8: 2 tue 4
9: 2 thu 2
10: 2 fri 6
This is the result of an aggregation of another data.table
. It represents the count (val
) of a variable depending on the week day (wday
) for different individuals (id
). The problem is, during my operations I've lost the week days where the count is 0.
So the question is : how could I update my data.table
object efficiently by inserting, for each id, as many rows as there are missing week days with val=0
?
The result would be the following :
id wday val
1: 1 mon 2
2: 1 tue 3
3: 1 wed 5
4: 1 thu 8
5: 1 fri 6
6: 1 sat 2
7: 1 sun 0
8: 2 mon 3
9: 2 tue 4
10: 2 wed 0
11: 2 thu 2
12: 2 fri 6
13: 2 sat 0
14: 2 sun 0
Thanks a lot for your help.
One straightforward way I could think of right now is to use expand.grid
to get all combinations and then use that to subset with allow.cartesian = TRUE
:
setkey(dt, "id", "wday")
vals <- c("mon", "tue", "wed", "thu", "fri", "sat", "sun")
idx <- expand.grid(vals, unique(dt$id))[, 2:1]
dt[J(idx), allow.cartesian=TRUE]
# id wday val
# 1: 1 mon 2
# 2: 1 tue 3
# 3: 1 wed 5
# 4: 1 thu 8
# 5: 1 fri 6
# 6: 1 sat 2
# 7: 1 sun NA
# 8: 2 mon 3
# 9: 2 tue 4
# 10: 2 wed NA
# 11: 2 thu 2
# 12: 2 fri 6
# 13: 2 sat NA
# 14: 2 sun NA
Alternatively, it is possible to directly build the idx
data table with CJ
:
dt[CJ(unique(dt$id),vals), allow.cartesian=TRUE]
这篇关于在数据表中有效地插入缺省行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!