通过匹配df1和amp的列中的模式来更新df2中的列df2使用R [英] Update a column in df2 by matching patterns in columns in df1 & df2 using R

查看:175
本文介绍了通过匹配df1和amp的列中的模式来更新df2中的列df2使用R的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样的2个数据框

  TEAM<  -  c(PE,PE ,TD,HPT,ATD)
CODE< - c(NA,F,A,H,G,D)
df1< - data.frame(TEAM,CODE)

CODE < - c(NA,F100,A234,D664,H435,G123,A666 ,D345,G324,NA)
TEAM <-C(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA)
df2 < data.frame(CODE,TEAM)

我正在尝试通过匹配第一个来更新df2中的TEAM df1中的代码列中的字母,df2中的代码列为

df2的所需输出

  CODE TEAM 
1 NA PE
2 F100 PE
3 A234 MPI
4 D664 ATD
5 H435 TDT
6 G123 HPT
7 A666 MPI
8 D345 ATD
9 G324 HPT
10 NA PE

我正在用sqldf尝试这种方式,但不正确

 库(sqldf)
df2 < - sqldf(c(update df2 set TEAM =
case
当CODE像'%F%'那么'PE'
当CODE像'%A%'那么'MPI'
当CODE像'%D%'那么'ATD'
当CODE像'%G%'那么'HPT'
当CODE像'%H%'那么'TDT'
else'NA'
end))

有没有人可以帮助我提供一些没有sqldf的方法呢?

解决方案

使用 match code>(在基本R中):

  df2 $ TEAM = df1 $ TEAM [match(substr(df2 $ CODE,1,1),df1 $ CODE)] 

df2
#CODE TEAM
#1< NA> PE
#2 F100 PE
#3 A234 MPI
#4 D664 ATD
#5 H435 TDT
#6 G123 HPT
#7 A666 MPI
#8 D345 ATD
#9 G324 HPT
#10< NA> PE

对于单个案例来说,这是合适的 - 如果您经常这样做,我会鼓励您只需将代码的第一个字母提取到其自己的列 CODE_1 中,然后执行常规合并或加入。


I have 2 data frames like this

TEAM <- c("PE","PE","MPI","TDT","HPT","ATD")
CODE <- c(NA,"F","A","H","G","D")
df1 <- data.frame(TEAM,CODE)

CODE <- c(NA,"F100","A234","D664","H435","G123","A666","D345","G324",NA)
TEAM <- c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA)
df2 <- data.frame(CODE,TEAM)

I am trying to update the TEAM in df2 by matching the first letter in code column in df1 with the code column in df2

My desired output for df2

   CODE TEAM
1    NA   PE
2  F100   PE
3  A234  MPI
4  D664  ATD
5  H435  TDT
6  G123  HPT
7  A666  MPI
8  D345  ATD
9  G324  HPT
10   NA   PE

I am trying this way with sqldf but it is not right

library(sqldf)
df2 <- sqldf(c("update df2 set TEAM = 
                  case
                    when CODE like '%F%' then 'PE'
                    when CODE like '%A%' then 'MPI'
                    when CODE like '%D%' then 'ATD'
                    when CODE like '%G%' then 'HPT'
                    when CODE like '%H%' then 'TDT'
                    else 'NA'
                  end"))

Can someone help me provide some directions on achieving this without sqldf?

解决方案

Using match and substr (both in base R):

df2$TEAM = df1$TEAM[match(substr(df2$CODE, 1, 1), df1$CODE)]

df2
#    CODE TEAM
# 1  <NA>   PE
# 2  F100   PE
# 3  A234  MPI
# 4  D664  ATD
# 5  H435  TDT
# 6  G123  HPT
# 7  A666  MPI
# 8  D345  ATD
# 9  G324  HPT
# 10 <NA>   PE

This is expedient for a single case - if you're doing things like this frequently I would encourage you to just extract the first letter of code into its own column, CODE_1, and then do a regular merge or join.

这篇关于通过匹配df1和amp的列中的模式来更新df2中的列df2使用R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆