比较字符并返回R中的不匹配项 [英] Compare characters and return mismatches In R

查看:45
本文介绍了比较字符并返回R中的不匹配项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想迭代比较字符并返回数据帧2列之间的不匹配.

I want to compare characters iteratively and return mismatches between 2 columns of a data frame.

如果x2x,y67y不应该返回,因为x仍然是x,y仍然是y.

It should not return if x2x, y67y, as x remains x and y remains as y.

输入:

x y    x_val              y_val
A  B   x2x, y67h, d7j  x2y, y67y, d7r
B  C   x2y, y67y, d7r  x2y, y67y, d7r
C  A   x2y, y67y, d7r  x2x, y67h, d7j  
C  D   x2y, y67y, d7r  x67b, g72v, b8c
D  E   x67b, g72v, b8c  x67r, g72j

我想添加一列val并返回x_val和y_val之间的差异

I want to add a column val and return differences between x_val and y_val

输出:

x y       x_val             y_val           val
A  B   x2x, y67h, d7j  x2y, y67y, d7r     x2y, d7r
B  C   x2y, y67y, d7r  x2y, y67y, d7r     NA
C  A   x2y, y67y, d7r  x2x, y67h, d7j     y67h, d7j
C  D   x2y, y67y, d7r  y67b, g72v, b8c    y67b, g72v, b8c
D  E   y67b, g72v, b8c  y67b, g72j        g72j

我尝试了 xy_val<-y_val [!(y_val%in%x_val)]

能否请您提出有关输出不匹配项的解决方案.

Could you please suggest solution on how to output mismatches.

我的数据:

structure(list(x = c("A", "B", "C", "C", "D"), y = c("B", "C", "A", "D", "E"), x_val = c("x2x, y67h, d7j", "x2y, y67y, d7r", "x2y, y67y, d7r", "x2y, y67y, d7r", "y67b, g72v, b8c"), y_val = c("x2y, y67y, d7r", "x2y, y67y, d7r", "x2x, y67h, d7j", "y67b, g72v, b8c", "y67b, g72j" )), class = "data.frame", row.names = c(NA, -5L))

感谢您的帮助!

谢谢

推荐答案

使用 dplyr purrr :

library(dplyr)
library(purrr)

f %>% mutate(diff_x = map2_chr(strsplit(x_val, split = ", "), 
                               strsplit(y_val, split = ", "), 
                               ~paste(grep('([a-z])(?>\\d+)(?!\\1)', setdiff(.x, .y), 
                                           value = TRUE, perl = TRUE), 
                                           collapse = ", ")) %>%
               replace(. == "", NA), 
             diff_y = map2_chr(strsplit(x_val, split = ", "), 
                               strsplit(y_val, split = ", "), 
                               ~paste(grep('([a-z])(?>\\d+)(?!\\1)', setdiff(.y, .x), 
                                           value = TRUE, perl = TRUE),
                                           collapse = ", ")) %>%
               replace(. == "", NA))

注释:

  1. grep 获取 setdiff 的输出,并删除任何格式为相同字符且数字之间的字符"的元素.

  1. grep takes the output of setdiff and removes any element with the format "same characters with digits in between"

([[a-z])与任何字母字符匹配.

([a-z]) matches any alpha characters.

(?> \\ d +)是一个原子团,可匹配任何长度的数字,但不会回溯.

(?>\\d+) is an atomic group that matches digits of any length but does not backtrack.

(?!\\ 1)是否定的前瞻,与([a-z])

输出:

  x y           x_val           y_val    diff_x          diff_y
1 A B  x2x, y67h, d7j  x2y, y67y, d7r y67h, d7j        x2y, d7r
2 B C  x2y, y67y, d7r  x2y, y67y, d7r      <NA>            <NA>
3 C A  x2y, y67y, d7r  x2x, y67h, d7j  x2y, d7r       y67h, d7j
4 C D  x2y, y67y, d7r y67b, g72v, b8c  x2y, d7r y67b, g72v, b8c
5 D E y67b, g72v, b8c      y67b, g72j g72v, b8c            g72j

这篇关于比较字符并返回R中的不匹配项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆