在多列上执行左联接,其中之一是部分字符串 [英] Performing a left join on multiple columns, one of which is a partial string
本文介绍了在多列上执行左联接,其中之一是部分字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试通过以下方式加入:a)名字的前两个字符,b)姓氏和c)年。我正在阅读有关Fuzzyjoin的内容,但似乎并不是我所需要的。
I'm attempting to join by a) the first two characters of a first name, b) last name, and c) year. I was doing some reading on fuzzyjoin, but it doesn't seem like it's quite what I need.
我已经尝试
newly_joined_df <- names_df %>%
left_join(values_df, by = c(substr("first_name", 1, 2), "last_name", "year")
和
newly_joined_df <- names_df %>%
left_join(values_df, by = c(substr(names_df$first_name, 1, 2), "last_name", "year")
但这两个都是愚蠢的解决方案,并且抛出了明显的错误。
But both were dumb solutions, and threw obvious errors.
推荐答案
这个怎么样?
library(dplyr)
df1 %>%
mutate(first_name_1st2char = substr(first_name, 1, 2)) %>%
left_join(df2 %>% mutate(first_name_1st2char = substr(first_name, 1, 2)),
by = c("first_name_1st2char", "last_name", "year")) %>%
select(-first_name_1st2char)
输出为:
first_name.x last_name year first_name.y age
1 john asdf 2018 joe 12
2 jack qwerty 2017 jake 34
样本数据:
df1 <- structure(list(first_name = structure(c(2L, 1L), .Label = c("jack",
"john"), class = "factor"), last_name = structure(1:2, .Label = c("asdf",
"qwerty"), class = "factor"), year = c(2018, 2017)), .Names = c("first_name",
"last_name", "year"), row.names = c(NA, -2L), class = "data.frame")
df2 <- structure(list(first_name = structure(c(3L, 2L, 1L), .Label = c("donald",
"jake", "joe"), class = "factor"), last_name = structure(c(1L,
3L, 2L), .Label = c("asdf", "jong", "qwerty"), class = "factor"),
year = c(2018, 2017, 2018), age = c(12, 34, 5)), .Names = c("first_name",
"last_name", "year", "age"), row.names = c(NA, -3L), class = "data.frame")
这篇关于在多列上执行左联接,其中之一是部分字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文