dplyr字符串基于R中的查找表进行匹配和替换 [英] dplyr string match and replace based on lookup table in R

查看:637
本文介绍了dplyr字符串基于R中的查找表进行匹配和替换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试实现以前在Excel中完成的功能,但是很难找到实现该功能的方法。

I'm trying to achieve a function that I previously did in Excel, but having trouble finding a means to achieve it.

我有两个数据集:一个是我的基本数据集,另一个是查询表。
我的基地有两列,人的名字和姓氏。我的查找表也包含前两列,但其中还包含替换名。

I have two datasets: one is my base dataset and the other is a lookup table. My base has two columns, the first and last names of people. My lookup table has these first two columns as well, but it also includes a replacement first name.

People <- data.frame(
  Fname = c("Tom","Tom","Jerry","Ben","Rod","John","Perry","Rod"),
  Sname = c("Harper","Kingston","Ribery","Ghazali","Baker","Falcon","Jefferson","Lombardy")
)

Lookup <- data.frame(
  Fname = c("Tom","Tom","Rod","Rod"),
  Sname = c("Harper","Kingston","Baker","Lombardy"),
  NewFname = c("Tommy","Tim","Roderick","Robert")
)

我要做的是将Fname替换为NewFname,具体取决于两个条件:在两个数据帧中Fname和Sname都匹配。这是因为我有一个需要处理其他40,000行数据的数据集。最终,我希望最终得到以下数据框:

What I want to do is to replace the Fname with NewFname, dependent upon two conditions: that Fname and Sname match in both dataframes. This is because I have a dataset with other 40,000 rows of data that need to be processed. Ultimately, I'd hope to end up with the following dataframe:

People <- data.frame(
  Fname = c("Tommy","Tim","Jerry","Ben","Roderick","John","Perry","Robert"),
  Sname = c("Harper","Kingston","Ribery","Ghazali","Baker","Falcon","Jefferson","Lombardy")
)

但是,我需要一个函数解决方案,因此不必手动输入条件和替换名称。到目前为止,我有以下(问题)解决方案,其中涉及在dplyr中使用mutate生成新列,但无法正常工作

However, I want a function solution so I don't have to manually input the conditions and replacement names individually. So far, I have the following (problematic) solution, which would involve generating a new column using mutate in dplyr, but it's not working

 People %>%
  mutate(NewName = if_else(
    Fname == Lookup$Fname & Sname == Lookup$Sname, NewFname, Fname
  ))


推荐答案

只需使用 left_join 然后在!is.na()

library(dplyr)
People %>% 
  left_join(Lookup, by = c("Fname", "Sname")) %>% 
  mutate(Fname = ifelse(!is.na(NewFname), NewFname, Fname))
# Fname     Sname       NewFname
# 1    Tommy    Harper    Tommy
# 2      Tim  Kingston      Tim
# 3    Jerry    Ribery     <NA>
# 4      Ben   Ghazali     <NA>
# 5 Roderick     Baker Roderick
# 6     John    Falcon     <NA>
# 7    Perry Jefferson     <NA>
# 8   Robert  Lombardy   Robert

我离开了 NewFname

Data:

People <- data.frame(
  Fname = c("Tom","Tom","Jerry","Ben","Rod","John","Perry","Rod"),
  Sname = c("Harper","Kingston","Ribery","Ghazali","Baker","Falcon","Jefferson","Lombardy"), stringsAsFactors = F
)

Lookup <- data.frame(
  Fname = c("Tom","Tom","Rod","Rod"),
  Sname = c("Harper","Kingston","Baker","Lombardy"),
  NewFname = c("Tommy","Tim","Roderick","Robert"), stringsAsFactors = F
)

这篇关于dplyr字符串基于R中的查找表进行匹配和替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆