为不同的字串解析相等的XOR值以进行字谜检测 [英] Resolving equal XOR values for different strings for anagram detection

查看:45
本文介绍了为不同的字串解析相等的XOR值以进行字谜检测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近遇到一个面试问题,我必须编写一个包含两个字符串的函数,如果它们彼此相同,它们将返回 1 ,否则返回 0 .为简化起见,两个字符串的长度相同,非空,并且仅包含小写字母和数字字符.

I recently had an interview question where I had to write a function that takes two strings, and it would return 1 if they are anagrams of each other or else return 0. To simplify things, both strings are of the same length, non-empty, and only contain lower-case alphabetical and numeric characters.

我实现了一个函数,该函数独立地累加每个字符串的每个字符的XOR值,然后比较每个字符串的最终XOR值以查看它们是否相等.如果是,我将返回 1 ,否则返回 0 .

What I implemented a function that accumulates the XOR value of each character of each string independently then compared the final XOR values of each string to see if they are equal. If they are, I would return 1, else return 0.

我的功能:

int isAnagram(char* str1, char* str2){
    int xor_acc_1 = 0;
    int xor_acc_2 = 0;
    for(int i = 0; i<strlen(str1); i++){
        xor_acc_1 ^= str1[i] - '0';
        xor_acc_2 ^= str2[i] - '0';
    }
    return xor_acc_1 == xor_acc_2;
}

除了一个测试用例,我的函数适用于所有情况.

My function worked for every case except for one test case.

char* str1 = "123";
char* str2 = "303";

令我惊讶的是,即使这两个字符串不是彼此的字谜,它们都返回了 48 作为其XOR值.

To my surprise, even though these two strings are not anagrams of each other, they both returned 48 as their XOR value.

我的问题是:在不使用数据结构的情况下,仍可以在线性时间内使用XOR来解决此问题吗?通过修改XOR背后的数学原理生成一张Map?

My question is: Can this be resolve still with XOR in linear time, without the usage of a data structure e.g. a Map, through modification on the mathematics behind XOR?

推荐答案

xor 解决方案将不起作用,因为在此过程中会丢失信息(此问题可能以其他形式存在以及有损计算(例如哈希).在这种情况下丢失的信息是用于比较的 actual 字符.

A pure xor solution will not work as there is information lost during the process (this problem is likely to exist in other forms of lossy calculation as well, such as hashing). The information lost in this case is the actual characters being used for comparison.

通过示例,考虑两个字符串 ae bf (以ASCII表示):

By way of example, consider the two strings ae and bf (in ASCII):

  a: 0110 0001    b: 0110 0010
  e: 0110 0101    f: 0110 0110
     ---- ----       ---- ----
xor: 0000 0100       0000 0100

您可以看到,尽管两个字符串 完全不同,但 xor 的结果却是相同的.

You can see that the result of the xor is identical for both string despite the fact they are totally different.

一旦意识到任何 xor 本身的值都是零,这可能甚至变得更明显,这意味着所有字符串,例如 aa bb cc xx 等,在您的方案下将被视为字谜.

This may become even more obvious once you realise that any value xor-ed with itself is zero, meaning that all strings like aa, bb, cc, xx, and so on, would be considered anagrams under your scheme.

因此,现在您已经将该方法确定为不合适的方法,想到了两个选择.

So, now you've established that method as unsuitable, there are a couple of options that spring to mind.

第一个方法是简单地对两个字符串进行排序并进行比较.一旦排序,它们将在每个字符的基础上相同.这将起作用,但是不太可能提供您请求的 O(n)时间复杂性,因为您几乎肯定会使用比较样式排序.

The first is to simply sort both strings and compare them. Once sorted, they will be identical on a character-by-character basis. This will work but it's unlikely to deliver your requested O(n) time complexity since you'll almost certainly be using a comparison style sort.

第二个仍然允许您通过使用通常的交易时间技巧"来满足该要求.您只需设置每个字符的计数(所有初始都为零),然后为第一个字符串中的每个字符增加其计数.

The second still allows you to meet that requirement by using the usual "trick" of trading space for time. You simply set up a count of each character (all initially zero) then, for each character in the first string, increase its count.

此后,对于 second 字符串中的每个字符,减少其计数.

After that, for each character in the second string, decrease its count.

这是线性时间复杂度,如果在处理后将每个字符计数都设置为零,则字符串可以视为字谜.仅当一个字符在一个字符串中的出现次数比另一个字符串中出现的次数多时,任何非零计数都将存在.

That's linear time complexity and the strings can be deemed to be anagrams if every character count is set to zero after the process. Any non-zero count will only be there if a character occurred more times in one string than the other.

这实际上是计数排序,这是一种非比较性排序,表示它不受限制这些类型的正常最小 O(n log n)时间复杂度.

This is effectively a counting sort, a non-comparison sort meaning it's not subject to the normal minimum O(n log n) time complexity for those sorts.

此类野兽的伪代码为:

def isAnagram(str1, str2):
    if len(str1) != len(str2):    # Can also handle different lengths.
        return false

    dim count[0..255] = {0}       # Init all counts to zero.

    for each code in str1:        # Increase for each char in string 1.
        count[code]++

    for each code in str2:        # Decrease for each char in string 2.
        count[code]--

    for each code in 0..255:
        if count[code] != 0:      # Any non-zero means non-anagram.
            return false    

    return true                   # All zero means anagram.


顺便说一下,

这里是一个完整的C测试程序,它说明了这一概念,尽管可以通过对 #if 部分进行简单的更改来添加更多宽度,但它能够处理8位字符宽度.:


Here, by the way, is a complete C test program which illustrates this concept, able to handle 8-bit character widths though more widths can be added with a simple change to the #if section:

#include <stdio.h>
#include <string.h>
#include <limits.h>
#include <stdbool.h>

#if CHAR_BIT == 8
    #define ARRSZ 256
#else
    #error Need to adjust for unexpected CHAR_BIT.
#endif

static bool isAnagram(unsigned char *str1, unsigned char *str2) {
    // Ensure strings are same size.

    size_t len = strlen(str1);
    if (len != strlen(str2))
        return false;

    // Initialise all counts to zero.

    int count[ARRSZ];
    for (size_t i = 0; i < sizeof(count) / sizeof(*count); ++i)
        count[i] = 0;

    // Increment for string 1, decrement for string 2.

    for (size_t i = 0; i < len; ++i) {
        count[str1[i]]++;
        count[str2[i]]--;
    }

    // Any count non-zero means non-anagram.

    for (size_t i = 0; i < sizeof(count) / sizeof(*count); ++i)
        if (count[i] != 0)
            return false;

    // All counts zero means anagram.

    return true;
}

int main(int argc, char *argv[]) {
    if ((argc - 1) % 2 != 0) {
        puts("Usage: check_anagrams [<string1> <string2>] ...");
        return 1;
    }

    for (size_t i = 1; i < argc; i += 2) {
        printf("%s: '%s' '%s'\n",
            isAnagram(argv[i], argv[i + 1]) ? "Yes" : " No",
            argv[i], argv[i + 1]);
    }

    return 0;
}

在一些合适的测试数据上运行它可以显示出它的作用:

Running this on some suitable test data shows it in action:

pax$ ./check_anagrams ' paxdiablo ' 'a plaid box' paxdiablo PaxDiablo \
         one two aa bb aa aa '' '' paxdiablo pax.diablo

Yes: ' paxdiablo ' 'a plaid box'
 No: 'paxdiablo' 'PaxDiablo'
 No: 'one' 'two'
 No: 'aa' 'bb'
Yes: 'aa' 'aa'
Yes: '' ''
 No: 'paxdiablo' 'pax.diablo'

这篇关于为不同的字串解析相等的XOR值以进行字谜检测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆