data.table R中的滞后列表 [英] Lagged lists in data.table R
问题描述
gearLag
lead / lags gear
正确,但 gearsListLag
不是滞后 gearsList
,而是 shift
在 code>在同一行中延迟元素本身。 dt< - data.table(mtcars) [,。(gear,carb,cyl)]
###列出
dt [,carbList:= list(list(unique(carb))) )]
###现在我想要滞后/引用列表的col
dt [,。(carb,carbLag = shift(carb)
,carbList,carbListLag = shift =lead)),by = cyl]
cyl carb carbLag carbList carbListLag
1:6 4 NA 4 NA
2:6 4 4 4 NA
3:6 1 4 1 NA < - 应该是4这里,不是NA
4:6 1 1 1 NA
5:6 4 1 4 NA
6:6 4 4 4 NA
7:6 6 4 6 NA
8:4 1 NA 1,2 2,NA
9:4 2 1 1,2 2,NA
10:4 2 2 1,2 2,NA
11:4 1 2 1,2 2,NA
12:4 2 1 1,2 2,NA
13:4 1 2 1,2 2 ,NA
14:4 1 1 1 NA < - 应该是(1,2)这里,不是NA
15:4 1 1 1,2 2,NA
16:4 2 1 2 NA
17:4 2 2 2 NA
18:4 2 2 1,2 2,NA
19:8 2 NA 2,4,3 4,3,NA
20:8 4 2 2,4,3 4,3,NA
21:8 3 4 2,4,3 4,3,NA
22:8 3 3 2,4, 3 4,3,NA
23:8 3 3 2,4,3 4,3,NA
$ b
解决方案这是记录在案的行为。 c:
#on lists
ll = list(1:3,letters [4:1],runif(2))
shift(ll,1,type =lead)
#[[1]]
#[1] 2 3 NA
#
#[[2]]
#[1]cbaNA
#
#[[3]]
#[1] 0.1190792 NA
,您可以为列表的每个值创建唯一的ID:
dt [,carbList_id:= match(carbList,unique carbList))]
carbList_map = dt [,。(carbList = list(carbList [[1]]))by = carbList_id]
#carbList_id carbList
#1:1 4
#2:2 1,2
#3:3 1
#4:4 2,4,3
#5:5 2
#6:6 4,8
#7:7 6
#或坚持使用长形式:
carbList_map = dt [,。(carb = carbList [[ 1]]),by = carbList_id]
#carbList_id carb
#1:1 4
#2:2 1
#3:2 2
#4:3 1
#5:4 2
#6:4 4
#7:4 3
#8:5 2
#9:6 4
#10:6 8
#11:7 6
只需 shift
或任何带有新的ID列。当你需要 carbList
的值时,你必须与新表合并。
或者,如果你真的不需要使用值,而只是浏览它们,考虑使用一个字符串,如 carbList:= toString(sort(unique(carb)))
或与 paste0
。
注意:在使用 toString
, paste0
或列表
。
shift
in R
's data.table
is great for time series and time window stuff. But columns of lists don't lag the same way that columns of other elements do. In the code below, gearLag
lead/lags gear
correctly, but gearsListLag
isn't lagging gearsList
, instead, shift
is operating within gearsList
to lag the element on itself in the same row.
dt <- data.table(mtcars)[,.(gear, carb, cyl)]
### Make col of lists
dt[,carbList:=list(list(unique(carb))), by=.(cyl, gear)]
### Now I want to lag/lead col of lists
dt[,.(carb, carbLag=shift(carb)
, carbList, carbListLag=shift(carbList, type="lead")), by=cyl]
cyl carb carbLag carbList carbListLag
1: 6 4 NA 4 NA
2: 6 4 4 4 NA
3: 6 1 4 1 NA <-- should be 4 here, not NA
4: 6 1 1 1 NA
5: 6 4 1 4 NA
6: 6 4 4 4 NA
7: 6 6 4 6 NA
8: 4 1 NA 1,2 2,NA
9: 4 2 1 1,2 2,NA
10: 4 2 2 1,2 2,NA
11: 4 1 2 1,2 2,NA
12: 4 2 1 1,2 2,NA
13: 4 1 2 1,2 2,NA
14: 4 1 1 1 NA <-- should be (1,2) here, not NA
15: 4 1 1 1,2 2,NA
16: 4 2 1 2 NA
17: 4 2 2 2 NA
18: 4 2 2 1,2 2,NA
19: 8 2 NA 2,4,3 4, 3,NA
20: 8 4 2 2,4,3 4, 3,NA
21: 8 3 4 2,4,3 4, 3,NA
22: 8 3 3 2,4,3 4, 3,NA
23: 8 3 3 2,4,3 4, 3,NA
Any suggestions to lag on lists the same way I lag on other elements?
解决方案 This is documented behavior. Here's part of the example at ?shift
:
# on lists
ll = list(1:3, letters[4:1], runif(2))
shift(ll, 1, type="lead")
# [[1]]
# [1] 2 3 NA
#
# [[2]]
# [1] "c" "b" "a" NA
#
# [[3]]
# [1] 0.1190792 NA
To get around this, you can make a unique ID for each value of the list:
dt[, carbList_id := match(carbList, unique(carbList))]
carbList_map = dt[, .(carbList = list(carbList[[1]])), by=carbList_id]
# carbList_id carbList
# 1: 1 4
# 2: 2 1,2
# 3: 3 1
# 4: 4 2,4,3
# 5: 5 2
# 6: 6 4,8
# 7: 7 6
# or stick with long-form:
carbList_map = dt[, .(carb = carbList[[1]]), by=carbList_id]
# carbList_id carb
# 1: 1 4
# 2: 2 1
# 3: 2 2
# 4: 3 1
# 5: 4 2
# 6: 4 4
# 7: 4 3
# 8: 5 2
# 9: 6 4
# 10: 6 8
# 11: 7 6
Then, just shift
or whatever with the new ID column. When you need the value of the carbList
again, you'll have to merge with the new table.
Alternately, if you don't really need to work with the values, but just to browse them, consider making it a string instead, like carbList:=toString(sort(unique(carb)))
or with paste0
.
Side note: sort before using toString
, paste0
or list
.
这篇关于data.table R中的滞后列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!