自加入R [英] Self Joining in R

查看:65
本文介绍了自加入R的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是示例小标题:

test <- tibble(a = c("dd1","dd2","dd3","dd4","dd5"), 
               name = c("a", "b", "c", "d", "e"), 
               b = c("dd3","dd4","dd1","dd5","dd2"))

我想添加一个新列b_name作为自连接以使用以下方法进行测试:

And I want to add a new column b_name as self-join to test using:

dplyr::inner_join(test, test, by = c("a" = "b"))

我的表变大了(2.7M行有4列),并且出现以下错误:

My table is way to large (2.7M rows with 4 columns) and I get the following error:

错误:std :: bad_alloc

Error: std::bad_alloc

请告知正确的做法/最佳做法.

Please advise how to do it right / best practice.

我的最终目标是获得以下结构:

My final goal is to get the following structure:

   a     name  b     b_name
   dd1   a     dd3   c
   dd2   b     dd4   d
   dd3   c     dd1   a
   dd4   d     dd5   e
   dd5   e     dd2   b 

推荐答案

另一个选项是 fastmatch

library(fastmatch)
test$b_name <- with(test, name[fmatch(b, a)])
test$b_name
#[1] "c" "d" "a" "e" "b"


根据?fmatch 描述

fmatch是内置match()函数的更快版本.

fmatch is a faster version of the built-in match() function.

这篇关于自加入R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆