如何在使%字符分隔任何非连续匹配部分的同时输出两个字符串的匹配部分? [英] How can I output the matching portion of two strings, while having the % character separate any non-consecutively matching portions?

查看:137
本文介绍了如何在使%字符分隔任何非连续匹配部分的同时输出两个字符串的匹配部分?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望我可以为我一直在从事的项目提供一些帮助.给定两个字符串,我想输出这些字符串的匹配部分.此外,我希望匹配输出的任何非连续部分都用%符号分隔.

I was hoping I could get some help on a project I've been working on. Given two strings, I would like to output the matching portions of these strings. Further, I would like any non-consecutive portions of the match output to be separated by the % symbol.

例如,如果我的两个字符串输入是:

For example, if my two string inputs were:

  • This is a test case see if it works
  • test case it hopefully works
  • This is a test case see if it works and
  • test case it hopefully works

然后我想要的输出将是:

Then my desired output would be:

  • test case%it%works

我已经写了我希望代码的结构,但是需要一些帮助来微调确切的语法,我们将不胜感激.这是我认为可以完成的方式:

I have written how I would like the code to be structured, but need some help fine-tuning the exact syntax, any help would be really appreciated. Here is how I think it could be done:

string1 = A1 cell
string2 = B1 cell
output = ""
counter = 0
if LENGTH(string1) < LENGTH(string2) then split_string=string1 and other_string=string2             '
ELSE split_string=string2 and other_string=string1
matchable_values=split(split_string)
for each element in matchable_values      
    if ISNUMBER(SEARCH(element, other_string,counter)) then 
       output = output & element & %   and counter = counter + 
       LENGTH(element) + 1
     ELSEIF counter = counter + LENGTH(element) + 1
next element

return output

推荐答案

您尝试的操作不容易完成,并且您需要一些高级开发技能(

What you try is not easy to accomplish, and you would need some advanced development skills (basic knowledge in dynamic programming is extremly useful).

您尝试做的实际上与在生物信息学中比对DNA序列的想法相同.

What you try to do is actually the same idea as aligning DNA sequences in bio informatics.

所以您需要做的是同时获取两个字符串(序列)

So what you would need to do is take both of your strings (sequences)

This is a test case, see if it works
test case, it hopefully works

并例如使用 Needleman–Wunsch算法对齐它们(有更多已知的算法可以进行对齐):

and align them for example using the Needleman–Wunsch algorithm (there are more known algorithms to do alignments):

This is a test case, see if it ----------works
----------test case, -------it hopefully works

然后检查哪些字符相同,因此结果将是……

Then check which characters are the same, so the result would be …

----------test case, -------it ----------works

,然后用%替换多个破折号,同时从末尾开始删除破折号.因此,您的最终结果将是:

And then replace the multiple dashes with % while removing dashes from the end and beginning. So your final result would be:

test case, %it %works

请注意,对于您的问题,没有一个确定的结果.永远会有更多结果!如果您进行比对,可能会有不同的方法来比对2个序列.

Note that for your issue there does not exist one definive result. There will always be more results possible! If you do alignments there may be different ways to align 2 sequences.

因此,上述对齐方式的Needleman Wunsch回溯看起来像这样:

So the Needleman Wunsch backtracing for the alignment above would look something like that:

例如,我们采用以下2个字符串:

For example we take the following 2 strings:

What output if this works?
What if this output works?

它们可以对齐(按单词排列)

They can be aligned (wordwise) as

What output if this        works?
What        if this output works?

或为

What         output if this works?
What if this output         works?

所以有2个结果

What % if this % works?
What % output % works?

,它们是不同的.其他字符串可能会产生超过2种可能的结果.

and they are different. Other strings might have even more than 2 possible results.

因此,您需要一种算法,该算法可以为您所有提供可能的结果,然后您需要一种算法来确定哪个是最好的(您想要的).在上述情况下,您如何判断2个结果中的哪一个是正确的? …你不能:)

So you need an algorithm that can give you all the possible results and then you need an algorithm to determine which one is the best one (the one you want to have). In the case above how would you tell which one of the 2 results is the right one? … you can't :)

再举一个例子:

我们使用以下2个字符串

We use the following 2 strings

to proove you wrong this is a good example for you
is this a good example to proove you wrong

(至少)可以如下对齐:

can be aligned (at least) as follows:

                       to prove you wrong this is a good example for you
is this a good example to prove you wrong


to prove you wrong this is a good example for you
                        is                        this a good example to prove you wrong

   to prove you wrong this is a good example          for you
is                    this    a good example to prove     you wrong

,您将获得以下3个(或更多)结果:

and you will get these 3 (or even more) results:

% to proove you wrong %
% is %
% this % a good example % you %

如果您的算法为您选择了第二个结果,您会好吗?还是您会希望有所不同?所有3个都是有效结果.

Would you be fine if your algorithm picks the second result for you? Or would you expect something different? All 3 are valid results.

但是您可能正在寻找最好的一个,我们可以通过计算空格词来获得.

But you are probably looking for the best one, and we can get this by counting the gap words.

间隔字较少的结果是最好的.因此,您看到第二个是最坏的一个,而最后一个是最好的一个.但是要对此进行评估,我们需要使用一种算法,该算法可以在第一步中找到所有结果,因此我们可以评估其中哪个是最好的结果.

The result with the less gap words is the best. So you see that the second is the worst one and the last one is the best one. But to be able to evaluate this we need to use an algorithm that is able to find all results in the first step, so we can evaluate which one of them is the best one.

这篇关于如何在使%字符分隔任何非连续匹配部分的同时输出两个字符串的匹配部分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆