查找两个字符串之间不同的第一个字符 [英] Find first character that is different between two strings

查看:322
本文介绍了查找两个字符串之间不同的第一个字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出两个相等长度的字符串,是否有一种优雅的方法来获取第一个不同字符的偏移量?

Given two equal-length strings, is there an elegant way to get the offset of the first different character?

显而易见的解决方案是:

The obvious solution would be:

for ($offset = 0; $offset < $length; ++$offset) {
    if ($str1[$offset] !== $str2[$offset]) {
        return $offset;
    }
}

但是对于如此简单的任务,这看起来并不正确.

But that doesn't look quite right, for such a simple task.

推荐答案

您可以使用按位XOR(^)实现:基本上,当您将两个字符串异或时,相同的字符将变为空字节("\0").因此,如果我们对两个字符串进行异或运算,我们只需要使用 strspn

You can use a nice property of bitwise XOR (^) to achieve this: Basically, when you xor two strings together, the characters that are the same will become null bytes ("\0"). So if we xor the two strings, we just need to find the position of the first non-null byte using strspn:

$position = strspn($string1 ^ $string2, "\0");

仅此而已.因此,让我们看一个例子:

That's all there is to it. So let's look at an example:

$string1 = 'foobarbaz';
$string2 = 'foobarbiz';
$pos = strspn($string1 ^ $string2, "\0");

printf(
    'First difference at position %d: "%s" vs "%s"',
    $pos, $string1[$pos], $string2[$pos]
);

这将输出:

位置7的第一个区别:"a"与"i"

First difference at position 7: "a" vs "i"

所以应该这样做.它非常有效,因为它仅使用C函数,并且只需要存储字符串的单个副本即可.

So that should do it. It's very efficient since it's only using C functions, and requires only a single copy of memory of the string.

function getCharacterOffsetOfDifference($str1, $str2, $encoding = 'UTF-8') {
    return mb_strlen(
        mb_strcut(
            $str1,
            0, strspn($str1 ^ $str2, "\0"),
            $encoding
        ),
        $encoding
    );
}

首先使用上述方法找到字节级别的差异,然后将偏移量映射到字符级别.这是通过 mb_strcut 函数完成的,该函数基本上是substr,但遵循多字节字符边界.

First the difference at the byte level is found using the above method and then the offset is mapped to the character level. This is done using the mb_strcut function, which is basically substr but honoring multibyte character boundaries.

var_dump(getCharacterOffsetOfDifference('foo', 'foa')); // 2
var_dump(getCharacterOffsetOfDifference('©oo', 'foa')); // 0
var_dump(getCharacterOffsetOfDifference('f©o', 'fªa')); // 1

虽然不如第一个解决方案那么优雅,但它仍然是单行的(并且如果使用默认编码会更简单):

It's not as elegant as the first solution, but it's still a one-liner (and if you use the default encoding a little bit simpler):

return mb_strlen(mb_strcut($str1, 0, strspn($str1 ^ $str2, "\0")));

这篇关于查找两个字符串之间不同的第一个字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆