字符串与多列匹配以在r中寻找可能的结果 [英] String match with multiple columns to look for possible result in r

查看:62
本文介绍了字符串与多列匹配以在r中寻找可能的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据帧DF1和DF2.在DF1中,我有不同的字符串组合,在DF2中,我有不同字符串组合的结果.我需要匹配DF1和DF2的字符串或字符串组合,并基于字符串匹配创建多个结果列作为结果数据帧DF_Result.

I have two data frames DF1 and DF2. In DF1 I have different sting combinations and in DF2 I have results for different string combinations. I need to match the string or string combinations from DF1 with DF2 and create multiple result columns based on string matches as resulting data frame DF_Result.

A=c("babypink","red,blue","purple,white","skyblue","pink,violet,green","silver,white,grey")

DF1 <- data.frame(A)

P=c("abcd","qwert","wxyz","efgh")
Q=c("red,blue","red","orange,yellow","white,black")
R=c("pink","violet,green","purple,white","golden")
S=c("silver,white","orange","grey","maroon")
T=c("black,white","skyblue","babypink","green")
U=c("yellow","blue","black","white")
DF2=data.frame(P,Q,R,S,T,U)

X=c("babypink","red,blue","purple,white","skyblue","pink,violet,green","silver,white,grey")
R1=c("wxyz","abcd","wxyz","qwert","abcd","abcd")
R2=c("","qwert","efgh","","qwert","wxyz") 
R3=c("","","","","efgh","efgh") 
DF_Result=data.frame(A,R1,R2,R3)

推荐答案

以下是可能的 tidyverse 解决方案.它得到的答案与您的 DF_Result 类似,但不完全相同(紫色,白色"与"abcd"与银色,白色"和黑色,白色"匹配).

Here is a possible tidyverse solution. It gets an answer similar to your DF_Result but not exactly ("purple,white" matched "abcd" with "silver,white" and "black,white").

数据帧更容易以长格式合并(使用 pivot_longer ).您可以使用 separate_rows 将逗号分隔的值放入单独的行中.

The data frames are easier to merge in long form (using pivot_longer). You can use separate_rows to put the comma separated values into separate rows.

library(tidyverse)

DF2_long <- DF2 %>%
  pivot_longer(cols = -P) %>%
  separate_rows(value)
  
DF1 %>%
  mutate(value = A) %>%
  separate_rows(value) %>%
  left_join(DF2_long) %>%
  select(-name, -value) %>%
  group_by(A) %>%
  distinct(A, P) %>%
  mutate(Count = row_number()) %>%
  pivot_wider(id_cols = A, names_from = Count, values_from = P, names_prefix = "R")

输出

  A                 R1    R2    R3   
  <chr>             <chr> <chr> <chr>
1 babypink          wxyz  NA    NA   
2 red,blue          abcd  qwert NA   
3 purple,white      wxyz  abcd  efgh 
4 skyblue           qwert NA    NA   
5 pink,violet,green abcd  qwert efgh 
6 silver,white,grey abcd  wxyz  efgh

这篇关于字符串与多列匹配以在r中寻找可能的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆