通过连接分配data.table行和列的子集 [英] assigning a subset of data.table rows and columns by join

查看:93
本文介绍了通过连接分配data.table行和列的子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图做类似的,但不同于这里描述的东西:
基于join 更新datas.table的子集



具体来说,我想分配匹配的键值( person_id 是两个表中的键)来自表控件的列值。 CI 是列索引。下面的语句说明'with = F'未使用。当我删除那些部分,它也不能按预期工作。任何建议?



要重新设置:我想设置与控件FROM控件对应的flatData子集。

  flatData [J(eval(control $ person_id)),ci,with = F] = control [,ci,with = F] 

使用经典的R提供可重现的示例:

  x = data.frame(a = 1:3,b = 1:3,key = c('a','b','c'))
y = data.frame 2,5),b = c(11,2),key = c('a','b'))

colidx = match colname(y))

x [x $ key%in%y $ key,colidx] = y [,colidx]

另外,有人请解释如何轻松地分配SETS列而不使用索引!

解决方案

您可以使用:= 运算符,同时如下:



首先准备数据:

  require(data.table)##> = 1.9.0 
setDT(x)##通过引用将DF转换为DT
setDT(y)
setkey(x,key)## set key column
setkey(y,key)

现在一行:

  x [y,c(a,b):= list ib)] 

:= (到位)。要修改的行由从 i 中的联接计算的索引提供。



ia ib x 时,c $ c> data.table 在内部生成以便于访问 i 当执行形式 x [i] 的连接时,$ c>和 i p>

HTH



PS:在您的示例中 y 列a和b的类型为numeric, x 的类型为integer,因此在运行数据时会收到警告,类型dint匹配,因此强制必须发生。


I'm trying to do something similar but different enough from what's described here: Update subset of data.table based on join

Specifically, I'd like to assign to matching key values (person_id is a key in both tables) column values from table control. CI is the column index. The statement below says 'with=F' was not used. when I delete those parts, it also doesn't work as expected. Any suggestions?

To rephrase: I'd like to set the subset of flatData that corresponds to control FROM control.

flatData[J(eval(control$person_id)), ci, with=F] = control[, ci, with=F]

To give a reproducible example using classic R:

x = data.frame(a = 1:3, b = 1:3, key = c('a', 'b', 'c'))
y = data.frame(a = c(2, 5), b = c(11, 2), key = c('a', 'b'))

colidx = match(c('a', 'b'), colnames(y))

x[x$key %in% y$key, colidx] = y[, colidx]

As an aside, someone please explain how to easily assign SETS of columns without using indices! Indices and data.table are a marriage made in hell.

解决方案

You can use the := operator along with the join simultaneously as follows:

First prepare data:

require(data.table) ## >= 1.9.0
setDT(x)            ## converts DF to DT by reference
setDT(y)
setkey(x, key)      ## set key column
setkey(y, key)

Now the one-liner:

x[y, c("a", "b") := list(i.a, i.b)]

:= modifies by reference (in-place). The rows to modify are provided by the indices computed from the join in i.

i.a and i.b are the column names data.table internally generates for easy access to i's columns when both x and i have identical column names, when performing a join of the form x[i].

HTH

PS: In your example y's columns a and b are of type numeric and x's are of type integer and therefore you'll get a warning when run on your data, that the types dint match and therefore a coercion had to take place.

这篇关于通过连接分配data.table行和列的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆