查找两个字符变量之间的公共子字符串 [英] Find common substrings between two character variables

查看:89
本文介绍了查找两个字符变量之间的公共子字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个字符变量(对象名称),我想提取最大的公共子字符串.

I have two character variables (names of objects) and I want to extract the largest common substring.

a <- c('blahABCfoo', 'blahDEFfoo')
b <- c('XXABC-123', 'XXDEF-123')

我希望得到以下结果:

[1] "ABC" "DEF"

这些向量作为输入应该给出相同的结果:

These vectors as input should give the same result:

a <- c('textABCxx', 'textDEFxx')
b <- c('zzABCblah', 'zzDEFblah')

这些示例具有代表性.字符串包含标识元素,每个向量元素中的其余文本是公用的,但未知.

These examples are representative. The strings contain identifying elements, and the remainder of the text in each vector element is common, but unknown.

在以下位置之一(按优先顺序排列)是否存在解决方案:

Is there a solution, in one of the following places (in order of preference):

  1. 基本R

  1. Base R

推荐软件包

CRAN上可用的软件包

Packages available on CRAN

假定重复的答案不满足这些要求.

The answer to the supposed-duplicate does not fulfill these requirements.

推荐答案

这是为此的CRAN软件包:

Here's a CRAN package for that:

library(qualV)

sapply(seq_along(a), function(i)
    paste(LCS(strsplit(a[i], '')[[1]], strsplit(b[i], '')[[1]])$LCS,
          collapse = ""))

这篇关于查找两个字符变量之间的公共子字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆