比较字符并返回R中的不匹配项 [英] Compare characters and return mismatches In R
问题描述
我想迭代比较字符并返回数据帧2列之间的不匹配.
I want to compare characters iteratively and return mismatches between 2 columns of a data frame.
如果x2x,y67y不应该返回,因为x仍然是x,y仍然是y.
It should not return if x2x, y67y, as x remains x and y remains as y.
输入:
x y x_val y_val
A B x2x, y67h, d7j x2y, y67y, d7r
B C x2y, y67y, d7r x2y, y67y, d7r
C A x2y, y67y, d7r x2x, y67h, d7j
C D x2y, y67y, d7r x67b, g72v, b8c
D E x67b, g72v, b8c x67r, g72j
我想添加一列val并返回x_val和y_val之间的差异
I want to add a column val and return differences between x_val and y_val
输出:
x y x_val y_val val
A B x2x, y67h, d7j x2y, y67y, d7r x2y, d7r
B C x2y, y67y, d7r x2y, y67y, d7r NA
C A x2y, y67y, d7r x2x, y67h, d7j y67h, d7j
C D x2y, y67y, d7r y67b, g72v, b8c y67b, g72v, b8c
D E y67b, g72v, b8c y67b, g72j g72j
我尝试了 xy_val<-y_val [!(y_val%in%x_val)]
能否请您提出有关输出不匹配项的解决方案.
Could you please suggest solution on how to output mismatches.
我的数据:
structure(list(x = c("A", "B", "C", "C", "D"), y = c("B", "C", "A", "D", "E"), x_val = c("x2x, y67h, d7j", "x2y, y67y, d7r", "x2y, y67y, d7r", "x2y, y67y, d7r", "y67b, g72v, b8c"), y_val = c("x2y, y67y, d7r", "x2y, y67y, d7r", "x2x, y67h, d7j", "y67b, g72v, b8c", "y67b, g72j" )), class = "data.frame", row.names = c(NA, -5L))
感谢您的帮助!
谢谢
推荐答案
使用 dplyr
和 purrr
:
library(dplyr)
library(purrr)
f %>% mutate(diff_x = map2_chr(strsplit(x_val, split = ", "),
strsplit(y_val, split = ", "),
~paste(grep('([a-z])(?>\\d+)(?!\\1)', setdiff(.x, .y),
value = TRUE, perl = TRUE),
collapse = ", ")) %>%
replace(. == "", NA),
diff_y = map2_chr(strsplit(x_val, split = ", "),
strsplit(y_val, split = ", "),
~paste(grep('([a-z])(?>\\d+)(?!\\1)', setdiff(.y, .x),
value = TRUE, perl = TRUE),
collapse = ", ")) %>%
replace(. == "", NA))
注释:
-
grep
获取setdiff
的输出,并删除任何格式为相同字符且数字之间的字符"的元素.
grep
takes the output ofsetdiff
and removes any element with the format "same characters with digits in between"
([[a-z])
与任何字母字符匹配.
([a-z])
matches any alpha characters.
(?> \\ d +)
是一个原子团,可匹配任何长度的数字,但不会回溯.
(?>\\d+)
is an atomic group that matches digits of any length but does not backtrack.
(?!\\ 1)
是否定的前瞻,与([a-z])
输出:
x y x_val y_val diff_x diff_y
1 A B x2x, y67h, d7j x2y, y67y, d7r y67h, d7j x2y, d7r
2 B C x2y, y67y, d7r x2y, y67y, d7r <NA> <NA>
3 C A x2y, y67y, d7r x2x, y67h, d7j x2y, d7r y67h, d7j
4 C D x2y, y67y, d7r y67b, g72v, b8c x2y, d7r y67b, g72v, b8c
5 D E y67b, g72v, b8c y67b, g72j g72v, b8c g72j
这篇关于比较字符并返回R中的不匹配项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!