在列表列中设置操作 [英] set operation within a list column
问题描述
我想在设置操作 -with-a-list-column> this 。
I am trying to do set operations between the vectors stored in a list column like this.
DT <- data.table(exp = c("exp1", "exp2", "exp2"),
sample = c(1L, 1L, 2L),
listdata = list(c(2L,5L), c(2L,3L,5L,7L), c(1L,2L,6L)))
> DT
exp sample listdata
1: exp1 1 2,5
2: exp2 1 2,3,5,7
3: exp2 2 1,2,6
虽然很繁琐,但我可以做
while very cumbersome, I can do
DT$inc = list(setdiff(unlist(DT$listdata[2]), unlist(DT$listdata[1])))
并获取值为 c(3,7)
的新列表列。但如果我尝试使用
and obtain a new list column with the value c(3,7)
. But if I try to calculate the difference between the current row and the first row using
DT$inc = list(list(setdiff(unlist(DT$listdata, recursive = FALSE), unlist(DT$listdata[1]))))
预期新列inc
0
c(3,7)
c(1,6)
我得到 c(3,7,1,6)
。显然, unlist
将整个列表列放在一起。任何想法发生了什么?
I get c(3,7,1,6)
. Apparently unlist
flattened the whole list column together. Any idea what's going on?
我也学习了dplyr和data.table。所以,如果你能提供解决方案使用其中之一,真的会有帮助。
I am also learning dplyr and data.table. So it would really help if you can provide solutions using one of them.
推荐答案
[...]我尝试计算当前行和第一个
[...] I try to calculate the difference between the current row and the first row
好吧,你可以...
DT[, inc := .(Map(setdiff, listdata, listdata[1L]))]
# exp sample listdata inc
# 1: exp1 1 2,5
# 2: exp2 1 2,3,5,7 3,7
# 3: exp2 2 1,2,6 1,6
但我认为只是不能使用列表列表更好。
But I think it's far better to just not work with list columns.
不能使用列表列可能看起来像...
Not working with list columns might look like...
DT[, r := .I]
DT2 = DT[,c(.SD[rep(.I, lengths(listdata))], .(v = unlist(listdata))), .SDcols=!"listdata"]
# exp sample r v
# 1: exp1 1 1 2
# 2: exp1 1 1 5
# 3: exp2 1 2 2
# 4: exp2 1 2 3
# 5: exp2 1 2 5
# 6: exp2 1 2 7
# 7: exp2 2 3 1
# 8: exp2 2 3 2
# 9: exp2 2 3 6
然后我们只使用这个数据集, p>
Then we just work with this data set, and can do
DT2[!DT2[r==1L], on="v"]
# exp sample r v
# 1: exp2 1 2 3
# 2: exp2 1 2 7
# 3: exp2 2 3 1
# 4: exp2 2 3 6
这篇关于在列表列中设置操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!