将值分配给特定的data.table列和行 [英] Assign value to specific data.table columns and rows
问题描述
仍然了解这个很棒的程序包...谁能解释这个错误的原因吗?
still understanding this great package... Could anyone please explain me the reason of this error? Thanks!
library(data.table)
DT <- data.table(id = LETTERS,
var1 = rnorm(26),
var2 = rnorm(26))
> DT[2, list(var1, var2)]
var1 var2
1: -0.8628479332 -0.2367492928
> DT[2, c(var1, var2)]
[1] -0.8628479332 -0.2367492928
>
> DT[2, list(var1, var2)] <- DT[8, list(var1, var2)]
Error in `[<-.data.table`(`*tmp*`, 2, list(var1, var2), value = list(var1 = -0.394006912428776, :
object 'var1' not found
> DT[2, c(var1, var2)] <- DT[8, c(var1, var2)]
Error in `[<-.data.table`(`*tmp*`, 2, c(var1, var2), value = c(-0.394006912428776, :
object 'var1' not found
推荐答案
首先,它建议使用:=
代替 [<-
来提高效率。 [< ;-
主要是为了向后保持一致而提供的,因此,我将首先说明如何有效地使用:=
来获得想要的东西。 :=
是按引用分配的(它更新data.table而不复制数据,因此极其非常快)。
First, it is recommended to use :=
instead of [<-
for efficiency. The [<-
is mostly provided for backward consistency. So, I'll first illustrate how to efficiently use :=
to get what you're after. :=
is assignment by reference (and it updates a data.table without copying the data, therefore extremely fast).
require(data.table)
DT <- data.table(x = 1:5, y = 6:10, z = 11:15)
假设您要更改第二行y的第五行到y的第五行:
Suppose you want to change the 2nd row of "y" to that of 5th row of "y":
DT[2, y := DT[5, y]]
或等效地
DT[2, `:=`(y = DT[5, y])]
假设您要将 y和 z的第二行更改为第5行中相应条目的行,然后:
Suppose you want to change the 2nd row of both "y" and "z" to that of the corresponding entries in row 5, then:
DT[2, c("y", "z") := as.list(DT[5, c(y, z)])]
或等效地
DT[2, `:=`(y = DT[5, y], z = DT[5, z])]
现在仅向您展示如何使用 [<-
(同时显然不建议这样做),可以按照以下步骤进行操作:
Now just to show you how to assign using [<-
(while it is clearly not recommended), it can be done as follows:
DT <- data.table(x = 1:5, y = 6:10, z = 11:15)
DT[1, c("y", "z")] <- as.list(DT[5, c(y, z)])
或等效地,您还可以传递列号:
or equivalently, you can also pass the column number:
DT[1, 2:3] <- as.list(DT[5, c(y, z)])
希望这会有所帮助。
首先,如果要分配的栏目超过1列,则RHS必须是 [<-data.table
的列表。
First, the RHS has to be a list for [<-data.table
if it has more than 1 columns to be assigned to.
第二个, <-
左侧的 j
自变量在您的data.table环境中。因此,它需要知道 j
的值是什么。而且由于您提供了 var1
和 var2
(不带双引号,字符向量),可以理解为变量。因此,它检查变量 var1
和 var2
,但是由于它没有看到您的列data.table作为变量(就像您通常在<-
的RHS上进行赋值等操作时一样),它将在其父环境中查找相同的变量在全局环境中找不到它们,因此您会得到错误。例如:
Second, j
argument on the left of <-
is not evaluated within the environment of your data.table. So, it needs to know what the values for j
are. And since you provide var1
and var2
(without the double quotes that would make them a character vector), it is understood to be a variable. And so, it checks for variables var1
and var2
, but since it doesn't "see" the columns within your data.table as variables (like it normally does when you do assignments etc on the RHS of <-
), it'll look for the same variables in its parent environment which is the global environment where it doesn't find them and so you get the error. For ex: do this:
y <- "y"
z <- "z"
# And now try your second case:
DT[2, c(y, z)] <- as.list(DT[5, c(y, z)])
# the left side takes values from the assignments you made above
# the right side y and z are evaluated within the environment of your data.table
# and so it sees the columns y and z as variables and their values are picked accordingly
第三, [< -data.table
函数仅接受 j
参数的 atomic
(矢量)类型。因此,您的第一次分配 DT [2,list(var1,var2)]<-DT [8,list(var1,var2)]
仍然会给出错误用正确的方法做到这一点,即:
Third, the [<-data.table
function accepts only atomic
(vector) types for j
argument. So, your first assignment DT[2, list(var1, var2)] <- DT[8, list(var1, var2)]
will still give an error if you do it the right way, that is:
y <- "y"
z <- "z"
DT[2, list(y, z)] <- as.list(DT[5, c(y, z)])
# Error in `[<-.data.table`(`*tmp*`, 2, list(y, z), value = list(10L, 15L)) :
# j must be atomic vector, see ?is.atomic
希望这会有所帮助。
DT <- data.table(x = 1:5, y = 6:10, z = 11:15)
tracemem(DT)
# [1] "<0x7fbefb89b580>"
DT[1, c("y", "z") := list(100L, 110L)]
tracemem(DT)
# [1] "<0x7fbefb89b580>"
DT[2, c("y", "z")] <- list(200L, 201L)
# tracemem[0x7fbefacc4fa0 -> 0x7fbefd297838]: # copied, inefficient
这篇关于将值分配给特定的data.table列和行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!