在嵌套data.table中通过引用修改列表列 [英] Modify list-column by reference in nested data.table

查看:65
本文介绍了在嵌套data.table中通过引用修改列表列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在嵌套的data.table中使用data.tables的列表列时,很容易在该列上应用函数。例如:

When using a list column of data.tables in a nested data.table it is easy to apply a function over the column. Example:

dt<- data.table(mtcars)[, list(dt.mtcars = list(.SD)), by = gear]

我们可以使用:

dt[ ,list(length = nrow(dt.mtcars[[1]])), by = gear]

dt[ ,list(length = nrow(dt.mtcars[[1]])), by = gear]

   gear length
1:    4     12
2:    3     15
3:    5      5

dt[, list( length = lapply(dt.mtcars, nrow)), by = gear]

  gear length
1:    4     12
2:    3     15
3:    5      5

我想执行相同的过程并应用使用操作符:= 对该列的每个data.table进行引用修改。

I would like to do the same process and apply a modification by reference using the operator := to each data.table of the column.

示例:

modify_by_ref<- function(d){

  d[, max_hp:= max(hp)]


}

dt[, modify_by_ref(dt.mtcars[[1]]), by  = gear]

返回错误:

 Error in `[.data.table`(d, , `:=`(max_hp, max(hp))) : 
  .SD is locked. Using := in .SD's j is reserved for possible future use; a tortuously flexible way to modify by group. Use := in j directly to modify by group by reference. 

使用错误消息中的提示对我没有任何作用,它似乎是针对性的另一种情况,但也许我错过了一些东西。是否有任何推荐的方法或灵活的变通办法来通过引用来修改列表列?

Using the tip in the error message do not works in any way for me, it seems to be targeting another case but maybe I am missing something. Is there any recommended way or flexible workaround to modify list columns by refence?

推荐答案

这可以通过以下两个步骤或单个步骤完成:

This can be done in following two steps or in Single Step:

给定的表是:

dt<- data.table(mtcars)[, list(dt.mtcars = list(.SD)), by = gear]

步骤1-让我们添加列<$ c dt

Step 1 - Let's add list of column hp vectors in each row of dt

dt[, hp_vector := .(list(dt.mtcars[[1]][, hp])), by = list(gear)]

第2步-现在计算 hp

dt[, max_hp := max(hp_vector[[1]]), by = list(gear)]

给定的表是:

dt<- data.table(mtcars)[, list(dt.mtcars = list(.SD)), by = gear]

单步骤-单个步骤实际上是上述两个步骤的组合:

Single Step - Single step is actually the combination of both of the above steps:

dt[, max_hp := .(list(max(dt.mtcars[[1]][, hp])[[1]])), by = list(gear)]

如果我们希望通过引用填充嵌套表中的值,则下面的链接讨论如何执行此操作,只是我们需要忽略警告消息。如果有人可以指出如何解决警告消息,或者有任何陷阱,我将很高兴。有关更多详细信息,请参阅链接:

If we wish to populate values within nested table by Reference then the following link talks about how to do it, just that we need to ignore a warning message. I will be happy if anyone can point me how to fix the warning message or is there any pitfall. For more detail please refer the link:

https://stackoverflow.com/questions/48306010/how-can-i-do-fast-advance-data-manipulation-in-nested-data-table-data-table-wi/48412406#48412406

从相同的方法中汲取灵感,我将在这里展示如何针对给定的数据集进行操作。

Taking inspiration from the same i am going to show how to do it here for the given data set.

让我们首先清理所有内容:

Let's first clean everything:

rm(list = ls())

让我们以不同的方式重新定义给定表:

Let's re-define the given table in different way:

dt<- data.table(mtcars)[, list(dt.mtcars = list(data.table(.SD))), by = list(gear)]

请注意,我定义的表略有不同。除了上面定义中的列表以外,我还使用了 data.table

Note that i have defined the table slightly different. I have used data.table in addition to list in the above definition.

接下来,通过引用填充最大值在嵌套表中:

Next, populate the max by reference within nested table:

dt[, dt.mtcars := .(list(dt.mtcars[[1]][, max_hp := max(hp)])), by = list(gear)]

可以预料,我们可以在嵌套表中执行操作:

And, what good one can expect, we can perform manipulation within nested table:

dt[, dt.mtcars := .(list(dt.mtcars[[1]][, weighted_hp_carb := max_hp*carb])), by = list(gear)]

这篇关于在嵌套data.table中通过引用修改列表列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆