任何种子的同一字符串上的 CRC32 哈希冲突 [英] CRC32 hash collision on the same string for any seed

查看:38
本文介绍了任何种子的同一字符串上的 CRC32 哈希冲突的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找到种子来散列最大可能长度的小写字母短字符串而不会发生冲突.我选择了 SSE 4.2 CRC32 来简化任务.对于长度为 4、5、6 的种子,在一些合理的小值内不会发生碰撞(我不能无限等待).

I tried to find seed to hash short strings of lowercase letters of maximum possible length without collisions. I chose SSE 4.2 CRC32 to make the task easier. For lengths 4, 5, 6 there is no collision for seeds up to some reasonable small value (I can't wait infinitely).

#include <bitset>
#include <limits>
#include <iterator>
#include <iostream>

#include <x86intrin.h>

static std::bitset<size_t(std::numeric_limits<uint32_t>::max()) + 1> hashes;

static void findSeed()
{
    uint8_t c[7];
    const auto findCollision = [&] (uint32_t seed)
    {
        std::cout << "seed = " << seed << std::endl;
        hashes.reset();
        for (c[0] = 'a'; c[0] <= 'z'; ++c[0]) {
            uint32_t hash0 = _mm_crc32_u8(~seed, c[0]);
            for (c[1] = 'a'; c[1] <= 'z'; ++c[1]) {
                uint32_t hash1 = _mm_crc32_u8(hash0, c[1]);
                for (c[2] = 'a'; c[2] <= 'z'; ++c[2]) {
                    uint32_t hash2 = _mm_crc32_u8(hash1, c[2]);
                    for (c[3] = 'a'; c[3] <= 'z'; ++c[3]) {
                        uint32_t hash3 = _mm_crc32_u8(hash2, c[3]);
                        for (c[4] = 'a'; c[4] <= 'z'; ++c[4]) {
                            uint32_t hash4 = _mm_crc32_u8(hash3, c[4]);
                            for (c[5] = 'a'; c[5] <= 'z'; ++c[5]) {
                                uint32_t hash5 = _mm_crc32_u8(hash4, c[5]);
                                for (c[6] = 'a'; c[6] <= 'z'; ++c[6]) {
                                    uint32_t hash6 = _mm_crc32_u8(hash5, c[6]);
                                    if (hashes[hash6]) {
                                        std::cerr << "collision at ";
                                        std::copy(std::cbegin(c), std::cend(c), std::ostream_iterator<uint8_t>(std::cerr, ""));
                                        std::cerr << " " << hash6 << '\n';
                                        return;
                                    }
                                    hashes.set(hash6);
                                }
                            }
                        }
                    }
                }
            }
            std::cout << "c[0] = " << c[0] << std::endl;
        }
    };
    for (uint32_t seed = 0; seed != std::numeric_limits<uint32_t>::max(); ++seed) {
        findCollision(seed);
    }
    findCollision(std::numeric_limits<uint32_t>::max());
}

int main()
{
    findSeed();
}

很明显,对于长度为 7 的字符串,不可能找到这样的种子,因为 ('z' - 'a' + 1)^7 = 26^7 = 8 031 810 176 >4 294 967 296 = size_t(std::numeric_limits::max()) + 1.但值得注意的是,对于字符串 abfcmbkbaabaaa 对于 any 种子,存在第一次冲突.hash6 发生碰撞时,不同的种子不同.我觉得很好奇.

It is clear, that for strings of length 7 it is impossible to find such a seed, because ('z' - 'a' + 1)^7 = 26^7 = 8 031 810 176 > 4 294 967 296 = size_t(std::numeric_limits<uint32_t>::max()) + 1. But notable thing is that for strings abfcmbk and baabaaa for any seed there is first collision. hash6 differs for different seeds when collision occured. It is curious on my mind.

如何解释?

推荐答案

如果 CRC(seed,dat)dat 的 CRC,则使用指定的 种子,然后对于任何种子(seed1,seed2)和匹配长度的数据对(dat1,dat2),并给定CRC(seed1,dat1),可以计算CRC(seed2,dat1) 通过计算 CRC(seed1, dat1)CRC(seed1,dat2)CRC(seed2,dat2) 的异或>.

If CRC(seed,dat) is the CRC of dat, using the specified seed, then for any seeds (seed1, seed2), and matching-length pair of data (dat1, dat2), and given CRC(seed1,dat1), one can compute CRC(seed2,dat1) by computing the xor of CRC(seed1, dat1), CRC(seed1,dat2), and CRC(seed2,dat2).

这反过来意味着,如果两段数据将为任何特定种子产生相同的 CRC 值,那么它们将为每个可能的种子产生相同的值.如果对于任何 seed1CRC(seed1,dat1a) 等于 CRC(seed1,dat1b),并且字符串长度相等,则对于任何其他种子 seed2 和相同长度的数据 dat2CRC(seed2,dat1a) 将等于 CRC(seed1, dat1a) xorCRC(seed1,dat2) xor CRC(seed2,dat2)CRC(seed2,dat1b) 将等于 CRC(seed1, dat1b) xor CRC(seed1,dat2)异或 CRC(seed2,dat2).由于 xor 的所有三项都相等,这意味着结果将同样相等.

This in turn implies that if two pieces of data would yield the same CRC value for any particular seed, they would yield the same value for every possible seed. If for any seed1, CRC(seed1,dat1a) equals CRC(seed1,dat1b), and the strings are of equal length, then for any other seed seed2 and same-length data dat2, CRC(seed2,dat1a) will equal CRC(seed1, dat1a) xor CRC(seed1,dat2) xor CRC(seed2,dat2), and CRC(seed2,dat1b) will equal CRC(seed1, dat1b) xor CRC(seed1,dat2) xor CRC(seed2,dat2). Since all three terms of the xors are equal, that implies that the results will be likewise equal.

这篇关于任何种子的同一字符串上的 CRC32 哈希冲突的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆