通过连接分配data.table行和列的子集 [英] assigning a subset of data.table rows and columns by join
问题描述
我试图做类似的,但不同于这里描述的东西:
基于join 更新datas.table的子集
具体来说,我想分配匹配的键值( person_id
是两个表中的键)来自表控件的列值。 CI
是列索引。下面的语句说明'with = F'未使用
。当我删除那些部分,它也不能按预期工作。任何建议?
要重新设置:我想设置与控件FROM控件对应的flatData子集。
flatData [J(eval(control $ person_id)),ci,with = F] = control [,ci,with = F]
使用经典的R提供可重现的示例:
x = data.frame(a = 1:3,b = 1:3,key = c('a','b','c'))
y = data.frame 2,5),b = c(11,2),key = c('a','b'))
colidx = match colname(y))
x [x $ key%in%y $ key,colidx] = y [,colidx]
另外,有人请解释如何轻松地分配SETS列而不使用索引!
您可以使用:=
运算符,同时如下:
首先准备数据:
require(data.table)##> = 1.9.0
setDT(x)##通过引用将DF转换为DT
setDT(y)
setkey(x,key)## set key column
setkey(y,key)
现在一行:
x [y,c(a,b):= list ib)]
:=
(到位)。要修改的行由从 i
中的联接计算的索引提供。
ia
和 ib
当 x $ c>时,c $ c> data.table
在内部生成以便于访问 i
当执行形式 x [i]
的连接时,$ c>和 i
p>
HTH
PS:在您的示例中 y
列a和b的类型为numeric, x
的类型为integer,因此在运行数据时会收到警告,类型dint匹配,因此强制必须发生。
I'm trying to do something similar but different enough from what's described here: Update subset of data.table based on join
Specifically, I'd like to assign to matching key values (person_id
is a key in both tables) column values from table control. CI
is the column index. The statement below says 'with=F' was not used
. when I delete those parts, it also doesn't work as expected. Any suggestions?
To rephrase: I'd like to set the subset of flatData that corresponds to control FROM control.
flatData[J(eval(control$person_id)), ci, with=F] = control[, ci, with=F]
To give a reproducible example using classic R:
x = data.frame(a = 1:3, b = 1:3, key = c('a', 'b', 'c'))
y = data.frame(a = c(2, 5), b = c(11, 2), key = c('a', 'b'))
colidx = match(c('a', 'b'), colnames(y))
x[x$key %in% y$key, colidx] = y[, colidx]
As an aside, someone please explain how to easily assign SETS of columns without using indices! Indices and data.table are a marriage made in hell.
You can use the :=
operator along with the join simultaneously as follows:
First prepare data:
require(data.table) ## >= 1.9.0
setDT(x) ## converts DF to DT by reference
setDT(y)
setkey(x, key) ## set key column
setkey(y, key)
Now the one-liner:
x[y, c("a", "b") := list(i.a, i.b)]
:=
modifies by reference (in-place). The rows to modify are provided by the indices computed from the join in i
.
i.a
and i.b
are the column names data.table
internally generates for easy access to i
's columns when both x
and i
have identical column names, when performing a join of the form x[i]
.
HTH
PS: In your example y
's columns a and b are of type numeric and x
's are of type integer and therefore you'll get a warning when run on your data, that the types dint match and therefore a coercion had to take place.
这篇关于通过连接分配data.table行和列的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!