dplyr字符串基于R中的查找表进行匹配和替换 [英] dplyr string match and replace based on lookup table in R
问题描述
我正在尝试实现以前在Excel中完成的功能,但是很难找到实现该功能的方法。
I'm trying to achieve a function that I previously did in Excel, but having trouble finding a means to achieve it.
我有两个数据集:一个是我的基本数据集,另一个是查询表。
我的基地有两列,人的名字和姓氏。我的查找表也包含前两列,但其中还包含替换名。
I have two datasets: one is my base dataset and the other is a lookup table. My base has two columns, the first and last names of people. My lookup table has these first two columns as well, but it also includes a replacement first name.
People <- data.frame(
Fname = c("Tom","Tom","Jerry","Ben","Rod","John","Perry","Rod"),
Sname = c("Harper","Kingston","Ribery","Ghazali","Baker","Falcon","Jefferson","Lombardy")
)
Lookup <- data.frame(
Fname = c("Tom","Tom","Rod","Rod"),
Sname = c("Harper","Kingston","Baker","Lombardy"),
NewFname = c("Tommy","Tim","Roderick","Robert")
)
我要做的是将Fname替换为NewFname,具体取决于两个条件:在两个数据帧中Fname和Sname都匹配。这是因为我有一个需要处理其他40,000行数据的数据集。最终,我希望最终得到以下数据框:
What I want to do is to replace the Fname with NewFname, dependent upon two conditions: that Fname and Sname match in both dataframes. This is because I have a dataset with other 40,000 rows of data that need to be processed. Ultimately, I'd hope to end up with the following dataframe:
People <- data.frame(
Fname = c("Tommy","Tim","Jerry","Ben","Roderick","John","Perry","Robert"),
Sname = c("Harper","Kingston","Ribery","Ghazali","Baker","Falcon","Jefferson","Lombardy")
)
但是,我需要一个函数解决方案,因此不必手动输入条件和替换名称。到目前为止,我有以下(问题)解决方案,其中涉及在dplyr中使用mutate生成新列,但无法正常工作
However, I want a function solution so I don't have to manually input the conditions and replacement names individually. So far, I have the following (problematic) solution, which would involve generating a new column using mutate in dplyr, but it's not working
People %>%
mutate(NewName = if_else(
Fname == Lookup$Fname & Sname == Lookup$Sname, NewFname, Fname
))
推荐答案
只需使用 left_join
然后在!is.na()
library(dplyr)
People %>%
left_join(Lookup, by = c("Fname", "Sname")) %>%
mutate(Fname = ifelse(!is.na(NewFname), NewFname, Fname))
# Fname Sname NewFname
# 1 Tommy Harper Tommy
# 2 Tim Kingston Tim
# 3 Jerry Ribery <NA>
# 4 Ben Ghazali <NA>
# 5 Roderick Baker Roderick
# 6 John Falcon <NA>
# 7 Perry Jefferson <NA>
# 8 Robert Lombardy Robert
我离开了 NewFname
Data:
People <- data.frame(
Fname = c("Tom","Tom","Jerry","Ben","Rod","John","Perry","Rod"),
Sname = c("Harper","Kingston","Ribery","Ghazali","Baker","Falcon","Jefferson","Lombardy"), stringsAsFactors = F
)
Lookup <- data.frame(
Fname = c("Tom","Tom","Rod","Rod"),
Sname = c("Harper","Kingston","Baker","Lombardy"),
NewFname = c("Tommy","Tim","Roderick","Robert"), stringsAsFactors = F
)
这篇关于dplyr字符串基于R中的查找表进行匹配和替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!