data.table R中的滞后列表 [英] Lagged lists in data.table R

查看:104
本文介绍了data.table R中的滞后列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

<$> $ <$> $ <$> $ <$> $ <$> $ <$> $ <$> $ <$> $ <$>是伟大的时间序列和时间窗口的东西。但是列的列不像其他元素的列那样滞后。在下面的代码中, gearLag lead / lags gear 正确,但 gearsListLag 不是滞后 gearsList ,而是 shift code>在同一行中延迟元素本身。

  dt<  -  data.table(mtcars) [,。(gear,carb,cyl)] 
###列出
dt [,carbList:= list(list(unique(carb))) )]
###现在我想要滞后/引用列表的col
dt [,。(carb,carbLag = shift(carb)
,carbList,carbListLag = shift =lead)),by = cyl]

cyl carb carbLag carbList carbListLag
1:6 4 NA 4 NA
2:6 4 4 4 NA
3:6 1 4 1 NA < - 应该是4这里,不是NA
4:6 1 1 1 NA
5:6 4 1 4 NA
6:6 4 4 4 NA
7:6 6 4 6 NA
8:4 1 NA 1,2 2,NA
9:4 2 1 1,2 2,NA
10:4 2 2 1,2 2,NA
11:4 1 2 1,2 2,NA
12:4 2 1 1,2 2,NA
13:4 1 2 1,2 2 ,NA
14:4 1 1 1 NA < - 应该是(1,2)这里,不是NA
15:4 1 1 1,2 2,NA
16:4 2 1 2 NA
17:4 2 2 2 NA
18:4 2 2 1,2 2,NA
19:8 2 NA 2,4,3 4,3,NA
20:8 4 2 2,4,3 4,3,NA
21:8 3 4 2,4,3 4,3,NA
22:8 3 3 2,4, 3 4,3,NA
23:8 3 3 2,4,3 4,3,NA


$ b

解决方案

这是记录在案的行为。 c:


 #on lists 
ll = list(1:3,letters [4:1],runif(2))
shift(ll,1,type =lead)




 #[[1]] 
#[1] 2 3 NA

#[[2]]
#[1]cbaNA

#[[3]]
#[1] 0.1190792 NA

,您可以为列表的每个值创建唯一的ID:

  dt [,carbList_id:= match(carbList,unique carbList))] 

carbList_map = dt [,。(carbList = list(carbList [[1]]))by = carbList_id]

#carbList_id carbList
#1:1 4
#2:2 1,2
#3:3 1
#4:4 2,4,3
#5:5 2
#6:6 4,8
#7:7 6

#或坚持使用长形式:
carbList_map = dt [,。(carb = carbList [[ 1]]),by = carbList_id]

#carbList_id carb
#1:1 4
#2:2 1
#3:2 2
#4:3 1
#5:4 2
#6:4 4
#7:4 3
#8:5 2
#9:6 4
#10:6 8
#11:7 6

只需 shift 或任何带有新的ID列。当你需要 carbList 的值时,你必须与新表合并。



或者,如果你真的不需要使用值,而只是浏览它们,考虑使用一个字符串,如 carbList:= toString(sort(unique(carb)))或与 paste0



注意:在使用 toString paste0 列表


shift in R's data.table is great for time series and time window stuff. But columns of lists don't lag the same way that columns of other elements do. In the code below, gearLag lead/lags gear correctly, but gearsListLag isn't lagging gearsList, instead, shift is operating within gearsList to lag the element on itself in the same row.

dt <- data.table(mtcars)[,.(gear, carb, cyl)]
###  Make col of lists
dt[,carbList:=list(list(unique(carb))), by=.(cyl, gear)]
###  Now I want to lag/lead col of lists
dt[,.(carb, carbLag=shift(carb)
    , carbList, carbListLag=shift(carbList, type="lead")), by=cyl] 

    cyl carb carbLag carbList carbListLag
 1:   6    4      NA         4           NA
 2:   6    4       4         4           NA
 3:   6    1       4         1           NA <-- should be 4 here, not NA
 4:   6    1       1         1           NA
 5:   6    4       1         4           NA
 6:   6    4       4         4           NA
 7:   6    6       4         6           NA
 8:   4    1      NA       1,2         2,NA
 9:   4    2       1       1,2         2,NA
10:   4    2       2       1,2         2,NA
11:   4    1       2       1,2         2,NA
12:   4    2       1       1,2         2,NA
13:   4    1       2       1,2         2,NA
14:   4    1       1         1           NA <-- should be (1,2) here, not NA
15:   4    1       1       1,2         2,NA
16:   4    2       1         2           NA
17:   4    2       2         2           NA
18:   4    2       2       1,2         2,NA
19:   8    2      NA     2,4,3      4, 3,NA
20:   8    4       2     2,4,3      4, 3,NA
21:   8    3       4     2,4,3      4, 3,NA
22:   8    3       3     2,4,3      4, 3,NA
23:   8    3       3     2,4,3      4, 3,NA

Any suggestions to lag on lists the same way I lag on other elements?

解决方案

This is documented behavior. Here's part of the example at ?shift:

# on lists
ll = list(1:3, letters[4:1], runif(2))
shift(ll, 1, type="lead")

# [[1]]
# [1]  2  3 NA
# 
# [[2]]
# [1] "c" "b" "a" NA 
# 
# [[3]]
# [1] 0.1190792        NA

To get around this, you can make a unique ID for each value of the list:

dt[, carbList_id := match(carbList, unique(carbList))]

carbList_map = dt[, .(carbList = list(carbList[[1]])), by=carbList_id]

#    carbList_id carbList
# 1:           1        4
# 2:           2      1,2
# 3:           3        1
# 4:           4    2,4,3
# 5:           5        2
# 6:           6      4,8
# 7:           7        6

# or stick with long-form:
carbList_map = dt[, .(carb = carbList[[1]]), by=carbList_id]

#     carbList_id carb
#  1:           1    4
#  2:           2    1
#  3:           2    2
#  4:           3    1
#  5:           4    2
#  6:           4    4
#  7:           4    3
#  8:           5    2
#  9:           6    4
# 10:           6    8
# 11:           7    6

Then, just shift or whatever with the new ID column. When you need the value of the carbList again, you'll have to merge with the new table.

Alternately, if you don't really need to work with the values, but just to browse them, consider making it a string instead, like carbList:=toString(sort(unique(carb))) or with paste0.

Side note: sort before using toString, paste0 or list.

这篇关于data.table R中的滞后列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆