R dplyr 按范围或虚拟列连接 [英] R dplyr join by range or virtual column

查看:10
本文介绍了R dplyr 按范围或虚拟列连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过一个范围或一个虚拟列连接两个小标题.但似乎 by - 参数只允许处理现有列名的 chrvector(chr).

I want to join two tibbles by a range or a virtual column. but it seems the by - parameter just allow to handle chr oder vector(chr) of existing column names.

在我的示例中,我有一个带有 value 列的 tibble d 和带有 from 的 tibble r> 和 to 列.

In my example I have a tibble d with a column value, and a tibble r with a from and a to column.

d <- tibble(value = seq(1,6, by = 0.2))
r <- tibble(from = seq(1,6), to = c(seq(2,6),Inf), class = LETTERS[seq(1,6)])

> d
# A tibble: 26 x 1
   value
   <dbl>
 1   1.0
 2   1.2
 3   1.4
 4   1.6
 5   1.8
 6   2.0
 7   2.2
 8   2.4
 9   2.6
10   2.8
# ... with 16 more rows

> r
# A tibble: 6 x 3
   from    to class
  <int> <dbl> <chr>
1     1     2     A
2     2     3     B
3     3     4     C
4     4     5     D
5     5     6     E
6     6   Inf     F

现在我想在fromto的范围内加入d中的value列代码>r:

now I want to join the value column in d within the range of from and to in r:

d %>% inner_join(r, by = "value between from and to")     # >= and <

我找不到办法做到这一点,所以决定将 dvaluefloor fromr

I can't find a way to do this so decided to join the floor of value in d with the from column in r

d %>% inner_join(r, by = c("floor(value)" = "from"))

当然我可以创建第二列来解决这个问题:

of course i can create a second column to solve that:

d %>% 
  mutate(join_value = floor(value)) %>% 
  inner_join(r, by = c("join_value" = "from")) %>% 
  select(value, class)

# A tibble: 26 x 2
   value class
   <dbl> <chr>
 1   1.0     A
 2   1.2     A
 3   1.4     A
 4   1.6     A
 5   1.8     A
 6   2.0     B
 7   2.2     B
 8   2.4     B
 9   2.6     B
10   2.8     B
# ... with 16 more rows

但是没有更舒服的方法吗?

but isn't there a more comfortable way?

谢谢

推荐答案

我认为 dplyr 中还没有实现不等式连接,或者永远不会(请参阅关于 加入不等式约束),但这是使用 SQL 连接的好情况:

I don't think inequality joins is implemented in dplyr yet, or it ever will (see this discussion on Join on inequality constraints), but this is a good situation to use an SQL join:

library(tibble)
library(sqldf)

as.tibble(sqldf("select d.value, r.class from d
                join r on d.value >= r.'from' and 
                          d.value < r.'to'"))

或者,如果您想将连接集成到您的 dplyr 链中,您可以使用 fuzzyjoin::fuzzy_join:

Alternatively, if you want to integrate the join into your dplyr chain, you can use fuzzyjoin::fuzzy_join:

library(dplyr)
library(fuzzyjoin)

d %>%
  fuzzy_join(r, by = c("value" = "from", "value" = "to"), 
             match_fun = list(`>=`, `<`)) %>%
  select(value, class)

结果:

# A tibble: 31 x 2
   value class
   <dbl> <chr>
 1   1.0     A
 2   1.2     A
 3   1.4     A
 4   1.6     A
 5   1.8     A
 6   2.0     A
 7   2.0     B
 8   2.2     B
 9   2.4     B
10   2.6     B
# ... with 21 more rows

请注意,我在 fromto 周围添加了单引号,因为它们是 SQL 语言的保留字.

Notice I added single quotes around from and to since those are reserved words for the SQL language.

这篇关于R dplyr 按范围或虚拟列连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆