在单元测试中提取哈希种子 [英] extract hash seed in unit testing

查看:170
本文介绍了在单元测试中提取哈希种子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要获取python用于复制失败单元测试的随机哈希种子.

I need to get the random hash seed used by python to replicate failing unittests.

如果 PYTHONHASHSEED 设置为非零整数, sys.flags.hash_randomization 可靠地提供了它:

If PYTHONHASHSEED is set to a non-zero integer, sys.flags.hash_randomization provides it reliably:

$ export PYTHONHASHSEED=12345
$ python3 -c 'import sys, os;print(sys.flags.hash_randomization, os.environ.get("PYTHONHASHSEED"))'
12345 12345

但是,如果哈希是随机的,则仅声明已使用种子,而不声明:

However, if hashing is randomised, it only states that a seed is used, not which:

$ export PYTHONHASHSEED=random
$ python3 -c 'import sys, os;print(sys.flags.hash_randomization, os.environ.get("PYTHONHASHSEED"))'
1 random

sys.hash_info 中的信息从不包含数据取决于种子.使用自python3.4起的哈希函数,尝试尝试也是不可行的并从给定的哈希值重建种子.

The information in sys.hash_info never includes data depending on the seed. With the hash function since python3.4, it seems also unfeasible to try and reconstruct the seed from given hashes.

上下文:在微调算法时,我们已经看到了依赖于set/dict迭代顺序的heisenbug.复制它们需要测试种子,最坏的情况是要测试4294967295,但即使是我们平均约100次测试,也相当长.

Context: When fine tuning an algorithm, we've seen heisenbugs that depend on set/dict iteration order. Replicating them requires testing seeds, at worst all 4294967295, but even our average of ~100 tests is quite lengthy.

我们一直考虑始终在外部将PYTHONHASHSEED设置为随机但已知的值,但希望避免这一额外层.

We have considered always externally setting PYTHONHASHSEED to random but known values, but would like to avoid this extra layer.

推荐答案

否,随机值已分配给

No, the random value is assigned to the uc field of the _Py_HashSecret union, but this is never exposed to Python code. That's because the number of possible values is far greater than what setting PYTHONHASHSEED can produce.

当您未设置PYTHONHASHSEED或将其设置为random时,Python会生成一个随机的24字节值用作种子.如果将PYTHONHASHSEED设置为整数,则该数字将通过 线性同余生成器 生成实际种子(请参见

When you don't set PYTHONHASHSEED or set it to random, Python generates a random 24-byte value to use as the seed. If you set PYTHONHASHSEED to an integer then that number is passed through a linear congruential generator to produce the actual seed (see the lcg_urandom() function). The problem is that PYTHONHASHSEED is limited to 4 bytes only. There are 256 ** 20 times more possible seed values than you could set via PYTHONHASHSEED alone.

可以使用ctypes访问_Py_HashSecret结构中的内部哈希值:

You can access the internal hash value in the _Py_HashSecret struct using ctypes:

from ctypes import (
    c_size_t,
    c_ubyte,
    c_uint64,
    pythonapi,
    Structure,
    Union,
)


class FNV(Structure):
    _fields_ = [
        ('prefix', c_size_t),
        ('suffix', c_size_t)
    ]


class SIPHASH(Structure):
    _fields_ = [
        ('k0', c_uint64),
        ('k1', c_uint64),
    ]


class DJBX33A(Structure):
    _fields_ = [
        ('padding', c_ubyte * 16),
        ('suffix', c_size_t),
    ]


class EXPAT(Structure):
    _fields_ = [
        ('padding', c_ubyte * 16),
        ('hashsalt', c_size_t),
    ]


class _Py_HashSecret_t(Union):
    _fields_ = [
        # ensure 24 bytes
        ('uc', c_ubyte * 24),
        # two Py_hash_t for FNV
        ('fnv', FNV),
        # two uint64 for SipHash24
        ('siphash', SIPHASH),
        # a different (!) Py_hash_t for small string optimization
        ('djbx33a', DJBX33A),
        ('expat', EXPAT),
    ]


hashsecret = _Py_HashSecret_t.in_dll(pythonapi, '_Py_HashSecret')
hashseed = bytes(hashsecret.uc)

但是,您实际上不能任何具有此信息的事情.您不能在新的Python进程中设置_Py_HashSecret.uc,因为这样做会破坏大多数设置的字典键,然后才可以从Python代码中设置(Python内部结构严重依赖于字典),并且散列的可能性等于256 ** 4个可能的LCG值几乎消失了.

However, you can't actually do anything with this information. You can't set _Py_HashSecret.uc in a new Python process as doing so would break most dictionary keys set before you could do so from Python code (Python internals rely heavily on dictionaries), and your chances of the hash being equal to one of the 256**4 possible LCG values is vanishingly small.

您的想法是在任何地方将PYTHONHASHSEED设置为已知值,这是一种更可行的方法.

Your idea to set PYTHONHASHSEED to a known value everywhere is a far more feasible approach.

这篇关于在单元测试中提取哈希种子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆