两个数据表的连接失败 [英] Join of two data.tables fails

查看:115
本文介绍了两个数据表的连接失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用数据表作为查找表:

I am trying to use a data table as a lookup table:

> (dt <- data.table(myid=rep(11:12,3),zz=1:6,key=c("myid","zz")))
   myid zz
1:   11  1
2:   11  3
3:   11  5
4:   12  2
5:   12  4
6:   12  6
> (id2name <- data.table(id=11:14,name=letters[1:4],key="id"))
   id name
1: 11    a
2: 12    b
3: 13    c
4: 14    d

想要的是

> (res <- data.table(myid=rep(11:12,3),zz=1:6,name=rep(letters[1:2],3),key=c("myid","zz")))
   myid zz name
1:   11  1    a
2:   11  3    a
3:   11  5    a
4:   12  2    b
5:   12  4    b
6:   12  6    b

失败:

> dt[id2name]
Starting binary search ...done in 0 secs
Error in vecseq(f__, len__, if (allow.cartesian) NULL else as.integer(max(nrow(x),  : 
  Join results in 8 rows; more than 6 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that's ok, try including `j` and dropping `by` (by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.
Calls: [ -> [.data.table -> vecseq

我做错了什么?

PS。替代方式来获得结果;什么是最习惯的方式来做我想要的( dt 仍然必须是 data.table ,但 id2name 可以是任何将int映射到其他东西 - 只要int不被假定为向量索引)。

PS. I am amenable to any alternative way to get the results; what is the most idiomatic way to do what I want (dt must still be a data.table, but id2name can be anything mapping int to something else - as long as the int is not assumed to be a vector index).

推荐答案

> dt[id2name, allow.cartesian=T, nomatch=0]
   myid zz name
1:   11  1    a
2:   11  3    a
3:   11  5    a
4:   12  2    b
5:   12  4    b
6:   12  6    b


$ b b

data.table 正在尝试保存您自己,以防您在无意中连接具有重复值的键。请注意,错误讯息(最终)会告诉您如果您确定自己正在做什么,该怎么办。

data.table is trying to save you from yourself in case you had an unintentional join on keys with duplicate values. Note that the error message (eventually) tells you what to do if you're sure you know what you're doing.

或者:

> id2name[dt]
   id name zz
1: 11    a  1
2: 11    a  3
3: 11    a  5
4: 12    b  2
5: 12    b  4
6: 12    b  6

这篇关于两个数据表的连接失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆