在嵌套data.table中通过引用修改列表列 [英] Modify list-column by reference in nested data.table
问题描述
在嵌套的data.table中使用data.tables的列表列时,很容易在该列上应用函数。例如:
When using a list column of data.tables in a nested data.table it is easy to apply a function over the column. Example:
dt<- data.table(mtcars)[, list(dt.mtcars = list(.SD)), by = gear]
我们可以使用:
dt[ ,list(length = nrow(dt.mtcars[[1]])), by = gear]
dt[ ,list(length = nrow(dt.mtcars[[1]])), by = gear]
gear length
1: 4 12
2: 3 15
3: 5 5
或
dt[, list( length = lapply(dt.mtcars, nrow)), by = gear]
gear length
1: 4 12
2: 3 15
3: 5 5
我想执行相同的过程并应用使用操作符:=
对该列的每个data.table进行引用修改。
I would like to do the same process and apply a modification by reference using the operator :=
to each data.table of the column.
示例:
modify_by_ref<- function(d){
d[, max_hp:= max(hp)]
}
dt[, modify_by_ref(dt.mtcars[[1]]), by = gear]
返回错误:
Error in `[.data.table`(d, , `:=`(max_hp, max(hp))) :
.SD is locked. Using := in .SD's j is reserved for possible future use; a tortuously flexible way to modify by group. Use := in j directly to modify by group by reference.
使用错误消息中的提示对我没有任何作用,它似乎是针对性的另一种情况,但也许我错过了一些东西。是否有任何推荐的方法或灵活的变通办法来通过引用来修改列表列?
Using the tip in the error message do not works in any way for me, it seems to be targeting another case but maybe I am missing something. Is there any recommended way or flexible workaround to modify list columns by refence?
推荐答案
这可以通过以下两个步骤或单个步骤完成:
This can be done in following two steps or in Single Step:
给定的表是:
dt<- data.table(mtcars)[, list(dt.mtcars = list(.SD)), by = gear]
步骤1-让我们添加列<$ c dt
Step 1 - Let's add list of column hp
vectors in each row of dt
dt[, hp_vector := .(list(dt.mtcars[[1]][, hp])), by = list(gear)]
第2步-现在计算 hp
dt[, max_hp := max(hp_vector[[1]]), by = list(gear)]
给定的表是:
dt<- data.table(mtcars)[, list(dt.mtcars = list(.SD)), by = gear]
单步骤-单个步骤实际上是上述两个步骤的组合:
Single Step - Single step is actually the combination of both of the above steps:
dt[, max_hp := .(list(max(dt.mtcars[[1]][, hp])[[1]])), by = list(gear)]
如果我们希望通过引用填充嵌套表中的值,则下面的链接讨论如何执行此操作,只是我们需要忽略警告消息。如果有人可以指出如何解决警告消息,或者有任何陷阱,我将很高兴。有关更多详细信息,请参阅链接:
If we wish to populate values within nested table by Reference then the following link talks about how to do it, just that we need to ignore a warning message. I will be happy if anyone can point me how to fix the warning message or is there any pitfall. For more detail please refer the link:
https://stackoverflow.com/questions/48306010/how-can-i-do-fast-advance-data-manipulation-in-nested-data-table-data-table-wi/48412406#48412406
从相同的方法中汲取灵感,我将在这里展示如何针对给定的数据集进行操作。
Taking inspiration from the same i am going to show how to do it here for the given data set.
让我们首先清理所有内容:
Let's first clean everything:
rm(list = ls())
让我们以不同的方式重新定义给定表:
Let's re-define the given table in different way:
dt<- data.table(mtcars)[, list(dt.mtcars = list(data.table(.SD))), by = list(gear)]
请注意,我定义的表略有不同。除了上面定义中的列表以外,我还使用了 data.table
。
Note that i have defined the table slightly different. I have used data.table
in addition to list in the above definition.
接下来,通过引用填充最大值在嵌套表中:
Next, populate the max by reference within nested table:
dt[, dt.mtcars := .(list(dt.mtcars[[1]][, max_hp := max(hp)])), by = list(gear)]
可以预料,我们可以在嵌套表中执行操作:
And, what good one can expect, we can perform manipulation within nested table:
dt[, dt.mtcars := .(list(dt.mtcars[[1]][, weighted_hp_carb := max_hp*carb])), by = list(gear)]
这篇关于在嵌套data.table中通过引用修改列表列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!