是否有任何算法可以解决每个字符使用不同权重的最长的常见后续问题? [英] Is there any algorithm to address the longest common subsequent problem with different weights for each character?

查看:171
本文介绍了是否有任何算法可以解决每个字符使用不同权重的最长的常见后续问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种算法,该算法可以在以下条件下解决两个字符串的LCS问题:

I'm looking for an algorithm that addresses the LCS problem for two strings with the following conditions:

每个字符串由英文字符组成,每个字符都有权重.例如:

Each string consists of English characters and each character has a weight. For example:

序列1(S1):权重为[1、2、4、1、3]的"ABBCD"

sequence 1 (S1): "ABBCD" with weights [1, 2, 4, 1, 3]

序列2(S2):权重为[7、5、1、2]的"TBDC"

sequence 2 (S2): "TBDC" with weights [7, 5, 1, 2]

假设MW(s, S)被定义为字符串S中子序列s相对于相关权重的最大权重.最重的公共子序列(HCS)定义为:

Suppose that MW(s, S) is defined as the maximum weight of the sub-sequence s in string S with respect to the associated weights. The heaviest common sub-sequence (HCS) is defined as:

HCS = argmin(MW(s,S1),MW(s,S2))

HCS = argmin(MW(s, S1), MW(s, S2))

算法输出应该是字符串和权重中HCS的索引.在这种情况下,索引将为:

The algorithm output should be the indexes of HCS in both strings and the weight. In this case, the indexes will be:

I_S1 = [2,4]-> MW("BD","ABBCD")= 7

I_S1 = [2, 4] --> MW("BD", "ABBCD") = 7

I_S2 = [1,2]-> MW("BD","TBDC")= 6

I_S2 = [1, 2] --> MW("BD", "TBDC") = 6

因此HCS = "BD", and weight = min(MW(s, S1), MW(s, S2)) = 6.

推荐答案

您需要构建的表将具有此表.

The table that you need to build will have this.

for each position in sequence 1
    for each position in sequence 2
        for each extreme pair of (weight1, weight2)
            (last_position1, last_position2)

其中极端对是不可能找到该点的子序列,该子序列的序列1的权重和序列2的权重均为> =且至少一个为>.

Where an extreme pair is one where it is not possible to find a subsequence to that point whose weights in sequence 1 and weights in sequence 2 are both >= and at least one is >.

可能有多个极端对,其中一个序列高于另一个.

There may be multiple extreme pairs, where one sequence is higher than the other.

规则是,在(i, -1)(-1, j)位置,唯一的极端对是权重为0的空集.在其他任何情况下,我们合并(i-1, j)(i, j-1)的极端对.然后如果是seq1[i] = seq2[j],则将选项添加到您转到(i-1, j-1)的位置,然后将ij包括在各个子序列中. (因此,将weight1[i]weight2[j]添加到权重中,然后进行合并.)

The rule is that at the (i, -1) or (-1, j) positions, the only extreme pair is the empty set with weight 0. At any other we merge the extreme pairs for (i-1, j) and (i, j-1). And then if seq1[i] = seq2[j], then add the options where you went to (i-1, j-1) and then included the i and j in the respective subsequences. (So add weight1[i] and weight2[j] to the weights then do a merge.)

对于该合并,您可以按weight1升序对先前两个点的所有极值进行升序排序,然后丢弃那些weight2小于或等于序列中较早前已发布的最佳weight2的所有极值.

For that merge you can sort by weight1 ascending, all of the extreme values for both previous points, then throw away all of the ones whose weight2 is less than or equal to the best weight2 that was already posted earlier in the sequence.

当您到达终点时,您可以找到具有最高最小值的极端对,这就是您的答案.然后,您可以返回数据结构以查找有问题的子序列.

When you reach the end you can find the extreme pair with the highest min, and that is your answer. You can then walk the data structure back to find the subsequences in question.

这篇关于是否有任何算法可以解决每个字符使用不同权重的最长的常见后续问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆