R dplyr按范围或虚拟列联接 [英] R dplyr join by range or virtual column
问题描述
我想按范围或虚拟列将两个小标题合并。但似乎 by
-参数只允许处理 chr
oder vector(chr)
现有的列名。
I want to join two tibbles by a range or a virtual column. but it seems the by
- parameter just allow to handle chr
oder vector(chr)
of existing column names.
在我的示例中,我有一个小标题 d
和一列值
和小标题 r
,其中包含 from
和至
列。
In my example I have a tibble d
with a column value
, and a tibble r
with a from
and a to
column.
d <- tibble(value = seq(1,6, by = 0.2))
r <- tibble(from = seq(1,6), to = c(seq(2,6),Inf), class = LETTERS[seq(1,6)])
> d
# A tibble: 26 x 1
value
<dbl>
1 1.0
2 1.2
3 1.4
4 1.6
5 1.8
6 2.0
7 2.2
8 2.4
9 2.6
10 2.8
# ... with 16 more rows
> r
# A tibble: 6 x 3
from to class
<int> <dbl> <chr>
1 1 2 A
2 2 3 B
3 3 4 C
4 4 5 D
5 5 6 E
6 6 Inf F
现在我想加入 value
列中的 d
在<$ c中的从
和到
的范围内$ c> r :
now I want to join the value
column in d
within the range of from
and to
in r
:
d %>% inner_join(r, by = "value between from and to") # >= and <
我找不到办法,所以决定加入
与来自$的 d
中的值
的底价
r
I can't find a way to do this so decided to join the floor
of value
in d
with the from
column in r
d %>% inner_join(r, by = c("floor(value)" = "from"))
我可以创建第二列来解决该问题:
of course i can create a second column to solve that:
d %>%
mutate(join_value = floor(value)) %>%
inner_join(r, by = c("join_value" = "from")) %>%
select(value, class)
# A tibble: 26 x 2
value class
<dbl> <chr>
1 1.0 A
2 1.2 A
3 1.4 A
4 1.6 A
5 1.8 A
6 2.0 B
7 2.2 B
8 2.4 B
9 2.6 B
10 2.8 B
# ... with 16 more rows
但是没有更舒适的方法吗?
but isn't there a more comfortable way?
谢谢
推荐答案
我不认为不平等联接是在 dplyr
中实现的,或者它会实现(请参阅加入不等式约束),但这是使用SQL连接的好情况:
I don't think inequality joins is implemented in dplyr
yet, or it ever will (see this discussion on Join on inequality constraints), but this is a good situation to use an SQL join:
library(tibble)
library(sqldf)
as.tibble(sqldf("select d.value, r.class from d
join r on d.value >= r.'from' and
d.value < r.'to'"))
或者,如果要将联接集成到您的 dplyr
链,您可以使用 fuzzyjoin :: fuzzy_join
:
Alternatively, if you want to integrate the join into your dplyr
chain, you can use fuzzyjoin::fuzzy_join
:
library(dplyr)
library(fuzzyjoin)
d %>%
fuzzy_join(r, by = c("value" = "from", "value" = "to"),
match_fun = list(`>=`, `<`)) %>%
select(value, class)
结果:
# A tibble: 31 x 2
value class
<dbl> <chr>
1 1.0 A
2 1.2 A
3 1.4 A
4 1.6 A
5 1.8 A
6 2.0 A
7 2.0 B
8 2.2 B
9 2.4 B
10 2.6 B
# ... with 21 more rows
注意,我在和至
的附近添加了单引号,因为这些都是保留的SQL语言的单词。
Notice I added single quotes around from
and to
since those are reserved words for the SQL language.
这篇关于R dplyr按范围或虚拟列联接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!