检查一个字符串的所有字符是否存在于 r 中的另一个字符串中 [英] check if all characters of one string exist in another string in r
问题描述
我正在尝试比较 PRABHAKAR SHARMA
和 SHARMA KUMAR PRABHAKAR
之类的字符串.目的是检查较短字符串的所有字符是否存在于另一个字符串中.如果是这种情况,我应该得到 100% 匹配,否则会得到一个表示匹配字符百分比的百分比.
I am trying to compare strings like PRABHAKAR SHARMA
and SHARMA KUMAR PRABHAKAR
. the intention is to check if all the characters of the shorter string exist in the other string. If that is the case, I should get a 100% match otherwise a percentage representing the percentage of characters that matched.
我尝试在 RecordLinkage
包中使用 levenshteinSim
,但它给出了一个数字,对应于将一个字符串更改为另一个字符串所需的更改次数.
I tried using levenshteinSim
in RecordLinkage
package but it gives a number corresponding to the number of changes required to change one string to another.
install.packages("RecordLinkage")
require(RecordLinkage)
levenshteinSim("PRABHAKAR SHARMA","SHARMA KUMAR PRABHAKAR")
#[1] 0.3636364
在这种情况下,我想要 100% 匹配.此外,这必须复制超过 1,000,000 条记录.
I want a 100% match in such a case. Also, this has to be replicated for over 1,000,000 records.
推荐答案
这是一种方法
s1 <- "PRABHAKAR SHARMA"
s2 <- "SHARMA KUMAR PRABHAKAR"
compare <- function(s1, s2) {
c1 <- unique(strsplit(s1, "")[[1]])
c2 <- unique(strsplit(s2, "")[[1]])
length(intersect(c1,c2))/length(c1)
}
compare(s1,s2)
#1
不过可能有点慢.它也将空格字符视为字符.使用 Vectorize
应用于列:
It may be a little slow, though. And it considers the space character as character, too. Use Vectorize
to apply on a column:
dat <- data.frame(small=c("a", "b"), big=c("aa", "cc"), stringsAsFactors=FALSE)
vcomp <- Vectorize(compare)
dat <- transform(dat, comp=vcomp(small, big))
这篇关于检查一个字符串的所有字符是否存在于 r 中的另一个字符串中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!