在数据表中有效地插入缺省行 [英] Efficiently inserting default missing rows in a data.table

查看:111
本文介绍了在数据表中有效地插入缺省行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有以下 data.table

  dt  wday = c(mon,tue wed,thu,fri,sat,mon,tue,thu,fri),
val = c(2,3,5,8,6 ,2,3,4,2,6))

id wday val
1:1 mon 2
2:1 tue 3
3:1 wed 5
4:1 thu 8
5:1 fri 6
6:1 sat 2
7:2 mon 3
8:2 tue 4
9 :2 thu 2
10:2 fri 6

另一个 data.table 。它代表一个变量的计数( val ),取决于不同的个人( wday c $ c> id )。问题是,在我的操作中,我已经失去了星期几,其中计数为0。



所以问题是:如何更新我的 val = 0

,通过为每个id插入与缺少星期几相同的行>

结果如下:

  id wday val 
1:1 mon 2
2:1 tue 3
3:1 wed 5
4:1 thu 8
5:1星期六6
6:1 sat 2
7:1 sun 0
8:2 mon 3
9:2 tue 4
10:2 wed 0
11:2 thu 2
12 :2 fri 6
13:2 sat 0
14:2 sun 0

感谢您的帮助。

解决方案

我现在可以想到的一个简单的方法是使用 expand.grid 获取所有组合,然后将其用于子集 allow.cartesian = TRUE

  setkey(dt,id,wday)
vals < - c(mon,tue,wed th,fri,sat,sun)
idx< - expand.grid(vals,unique(dt $ id))[,2:1] idx),allow.cartesian = TRUE]

#id wday val
#1:1 mon 2
#2:1 tue 3
#3:1 wed 5
#4:1 thu 8
#5:1 fri 6
#6:1 sat 2
#7:1 sun NA
#8:2 mon 3
#9:2 tue 4
#10:2 wed NA
#11:2 thu 2
#12:2 fri 6
#13:2 sat NA
#14:2 sun NA

或者, code> $


$ b <$ p $ code> dt [CJ(unique(dt $ id),vals),allow.cartesian = TRUE]


Suppose I've got the following data.table :

dt <- data.table(id=c(1,1,1,1,1,1,2,2,2,2),
           wday=c("mon","tue","wed","thu","fri","sat","mon","tue","thu","fri"),
           val=c(2,3,5,8,6,2,3,4,2,6))

    id wday val
 1:  1  mon   2
 2:  1  tue   3
 3:  1  wed   5
 4:  1  thu   8
 5:  1  fri   6
 6:  1  sat   2
 7:  2  mon   3
 8:  2  tue   4
 9:  2  thu   2
10:  2  fri   6

This is the result of an aggregation of another data.table. It represents the count (val) of a variable depending on the week day (wday) for different individuals (id). The problem is, during my operations I've lost the week days where the count is 0.

So the question is : how could I update my data.table object efficiently by inserting, for each id, as many rows as there are missing week days with val=0 ?

The result would be the following :

    id wday val
 1:  1  mon   2
 2:  1  tue   3
 3:  1  wed   5
 4:  1  thu   8
 5:  1  fri   6
 6:  1  sat   2
 7:  1  sun   0
 8:  2  mon   3
 9:  2  tue   4
10:  2  wed   0
11:  2  thu   2
12:  2  fri   6
13:  2  sat   0
14:  2  sun   0

Thanks a lot for your help.

解决方案

One straightforward way I could think of right now is to use expand.grid to get all combinations and then use that to subset with allow.cartesian = TRUE:

setkey(dt, "id", "wday")
vals <- c("mon", "tue", "wed", "thu", "fri", "sat", "sun")
idx <- expand.grid(vals, unique(dt$id))[, 2:1]
dt[J(idx), allow.cartesian=TRUE]

#     id wday val
#  1:  1  mon   2
#  2:  1  tue   3
#  3:  1  wed   5
#  4:  1  thu   8
#  5:  1  fri   6
#  6:  1  sat   2
#  7:  1  sun  NA
#  8:  2  mon   3
#  9:  2  tue   4
# 10:  2  wed  NA
# 11:  2  thu   2
# 12:  2  fri   6
# 13:  2  sat  NA
# 14:  2  sun  NA

Alternatively, it is possible to directly build the idx data table with CJ :

dt[CJ(unique(dt$id),vals), allow.cartesian=TRUE]

这篇关于在数据表中有效地插入缺省行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆