R dplyr 按范围或虚拟列连接 [英] R dplyr join by range or virtual column
问题描述
我想通过一个范围或一个虚拟列连接两个小标题.但似乎 by
- 参数只允许处理现有列名的 chr
或 vector(chr)
.
I want to join two tibbles by a range or a virtual column. but it seems the by
- parameter just allow to handle chr
oder vector(chr)
of existing column names.
在我的示例中,我有一个带有 value
列的 tibble d
和带有 from
的 tibble r
> 和 to
列.
In my example I have a tibble d
with a column value
, and a tibble r
with a from
and a to
column.
d <- tibble(value = seq(1,6, by = 0.2))
r <- tibble(from = seq(1,6), to = c(seq(2,6),Inf), class = LETTERS[seq(1,6)])
> d
# A tibble: 26 x 1
value
<dbl>
1 1.0
2 1.2
3 1.4
4 1.6
5 1.8
6 2.0
7 2.2
8 2.4
9 2.6
10 2.8
# ... with 16 more rows
> r
# A tibble: 6 x 3
from to class
<int> <dbl> <chr>
1 1 2 A
2 2 3 B
3 3 4 C
4 4 5 D
5 5 6 E
6 6 Inf F
现在我想在from
和to
的范围内加入d
中的value
列代码>r:
now I want to join the value
column in d
within the range of from
and to
in r
:
d %>% inner_join(r, by = "value between from and to") # >= and <
我找不到办法做到这一点,所以决定将 d
中 value
的 floor
与 from
r
I can't find a way to do this so decided to join the floor
of value
in d
with the from
column in r
d %>% inner_join(r, by = c("floor(value)" = "from"))
当然我可以创建第二列来解决这个问题:
of course i can create a second column to solve that:
d %>%
mutate(join_value = floor(value)) %>%
inner_join(r, by = c("join_value" = "from")) %>%
select(value, class)
# A tibble: 26 x 2
value class
<dbl> <chr>
1 1.0 A
2 1.2 A
3 1.4 A
4 1.6 A
5 1.8 A
6 2.0 B
7 2.2 B
8 2.4 B
9 2.6 B
10 2.8 B
# ... with 16 more rows
但是没有更舒服的方法吗?
but isn't there a more comfortable way?
谢谢
推荐答案
我认为 dplyr
中还没有实现不等式连接,或者永远不会(请参阅关于 加入不等式约束),但这是使用 SQL 连接的好情况:
I don't think inequality joins is implemented in dplyr
yet, or it ever will (see this discussion on Join on inequality constraints), but this is a good situation to use an SQL join:
library(tibble)
library(sqldf)
as.tibble(sqldf("select d.value, r.class from d
join r on d.value >= r.'from' and
d.value < r.'to'"))
或者,如果您想将连接集成到您的 dplyr
链中,您可以使用 fuzzyjoin::fuzzy_join
:
Alternatively, if you want to integrate the join into your dplyr
chain, you can use fuzzyjoin::fuzzy_join
:
library(dplyr)
library(fuzzyjoin)
d %>%
fuzzy_join(r, by = c("value" = "from", "value" = "to"),
match_fun = list(`>=`, `<`)) %>%
select(value, class)
结果:
# A tibble: 31 x 2
value class
<dbl> <chr>
1 1.0 A
2 1.2 A
3 1.4 A
4 1.6 A
5 1.8 A
6 2.0 A
7 2.0 B
8 2.2 B
9 2.4 B
10 2.6 B
# ... with 21 more rows
请注意,我在 from
和 to
周围添加了单引号,因为它们是 SQL 语言的保留字.
Notice I added single quotes around from
and to
since those are reserved words for the SQL language.
这篇关于R dplyr 按范围或虚拟列连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!