查找Perl中两个等长字符串之间的差异的快速方法 [英] Fast Way to Find Difference between Two Strings of Equal Length in Perl

查看:319
本文介绍了查找Perl中两个等长字符串之间的差异的快速方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出这样的字符串对.

Given pairs of string like this.

    my $s1 = "ACTGGA";
    my $s2 = "AGTG-A";

   # Note the string can be longer than this.

我想在$s1中找到与$s2不同的位置和字符. 在这种情况下,答案将是:

I would like to find position and character in in $s1 where it differs with $s2. In this case the answer would be:

#String Position 0-based
# First col = Base in S1
# Second col = Base in S2
# Third col = Position in S1 where they differ
C G 1
G - 4

我可以通过substr()轻松实现这一目标.但这太慢了. 通常,我需要比较数百万个这样的对.

I can achieve that easily with substr(). But it is horribly slow. Typically I need to compare millions of such pairs.

有没有一种快速的方法来实现这一目标?

Is there a fast way to achieve that?

推荐答案

按字符串^是您的朋友:

Stringwise ^ is your friend:

use strict;
use warnings;
my $s1 = "ACTGGA";
my $s2 = "AGTG-A";

my $mask = $s1 ^ $s2;
while ($mask =~ /[^\0]/g) {
    print substr($s1,$-[0],1), ' ', substr($s2,$-[0],1), ' ', $-[0], "\n";
}

解释:

^(异或)运算符在用于字符串时,将返回由异或结果或每个字符的数值的每一位组成的字符串.将示例分解为等效代码:

The ^ (exclusive or) operator, when used on strings, returns a string composed of the result of an exclusive or on each bit of the numeric value of each character. Breaking down an example into equivalent code:

"AB" ^ "ab"
( "A" ^ "a" ) . ( "B" ^ "b" )
chr( ord("A") ^ ord("a") ) . chr( ord("B") ^ ord("b") )
chr( 65 ^ 97 ) . chr( 66 ^ 98 )
chr(32) . chr(32)
" " . " "
"  "

此处有用的功能是,当且仅当两个字符串在给定位置具有相同字符时,才会出现nul字符("\0").因此,可以使用^一次快速操作来有效地比较两个字符串中的每个字符,并且可以在结果中搜索非null字符(表示差异).可以在标量上下文中使用/g regex标志重复搜索,并使用$-[0]找到每个字符差异的位置,该位置给出最后一次成功匹配的开始的偏移量.

The useful feature of this here is that a nul character ("\0") occurs when and only when the two strings have the same character at a given position. So ^ can be used to efficiently compare every character of the two strings in one quick operation, and the result can be searched for non-nul characters (indicating a difference). The search can be repeated using the /g regex flag in scalar context, and the position of each character difference found using $-[0], which gives the offset of the beginning of the last successful match.

这篇关于查找Perl中两个等长字符串之间的差异的快速方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆