查找两个字符变量之间的公共子字符串 [英] Find common substrings between two character variables
问题描述
我有两个字符变量(对象名称),我想提取最大的公共子字符串.
I have two character variables (names of objects) and I want to extract the largest common substring.
a <- c('blahABCfoo', 'blahDEFfoo')
b <- c('XXABC-123', 'XXDEF-123')
我希望得到以下结果:
[1] "ABC" "DEF"
这些向量作为输入应该给出相同的结果:
These vectors as input should give the same result:
a <- c('textABCxx', 'textDEFxx')
b <- c('zzABCblah', 'zzDEFblah')
这些示例具有代表性.字符串包含标识元素,每个向量元素中的其余文本是公用的,但未知.
These examples are representative. The strings contain identifying elements, and the remainder of the text in each vector element is common, but unknown.
在以下位置之一(按优先顺序排列)是否存在解决方案:
Is there a solution, in one of the following places (in order of preference):
-
基本R
Base R
推荐软件包
CRAN上可用的软件包
Packages available on CRAN
假定重复的答案不满足这些要求.
The answer to the supposed-duplicate does not fulfill these requirements.
推荐答案
这是为此的CRAN软件包:
Here's a CRAN package for that:
library(qualV)
sapply(seq_along(a), function(i)
paste(LCS(strsplit(a[i], '')[[1]], strsplit(b[i], '')[[1]])$LCS,
collapse = ""))
这篇关于查找两个字符变量之间的公共子字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!