等效于影响整数散列的python的-R选项 [英] Equivalent to python's -R option that affects the hash of ints

查看:31
本文介绍了等效于影响整数散列的python的-R选项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有大量 Python 代码,它们接受一些输入并产生一些输出.

We have a large collection of python code that takes some input and produces some output.

我们想保证,给定相同的输入,无论 python 版本或本地环境如何,我们都会产生相同的输出.(例如,代码是在 Windows、Mac 还是 Linux 上以 32 位或 64 位运行)

We would like to guarantee that, given the identical input, we produce identical output regardless of python version or local environment. (e.g. whether the code is run on Windows, Mac, or Linux, in 32-bit or 64-bit)

我们一直在自动化测试套件中强制执行此操作,方法是在使用和不使用 Python 的 -R 选项的情况下运行我们的程序并比较输出,假设这会消除我们意外输出的任何位置最终取决于 dict 上的迭代.(我们代码中最常见的不确定性来源)

We have been enforcing this in an automated test suite by running our program both with and without the -R option to python and comparing the output, assuming that would shake out any spots where our output accidentally wound up dependent on iteration over a dict. (The most common source of non-determinism in our code)

然而,由于我们最近调整了我们的代码以支持 python 3,我们发现了一个地方,我们的输出部分依赖于对使用 ints 的 dict 的迭代作为钥匙.与 python2 相比,此迭代顺序在 python3 中发生了变化,并使我们的输出不同.我们现有的测试(全部在 python 2.7 上)没有注意到这一点.(因为-R 不影响ints 的hash)一旦找到,很容易修复,但我们希望早点找到它.

However, as we recently adjusted our code to also support python 3, we discovered a place where our output depended in part on iteration over a dict that used ints as keys. This iteration order changed in python3 as compared to python2, and was making our output different. Our existing tests (all on python 2.7) didn't notice this. (Because -R doesn't affect the hash of ints) Once found, it was easy to fix, but we would like to have found it earlier.

有什么方法可以进一步对我们的代码进行压力测试,并让我们相信我们已经根据 Python 版本/环境中可能不同的某些内容隐式地找出了所有最终的位置?我认为像 -RPYTHONHASHSEED 之类的东西适用于数字以及 strbytesdatetime 对象可以工作,但我对其他方法持开放态度.但是,如果可能,我希望我们的自动化测试机器只需要安装一个 Python 版本.

Is there any way to further stress-test our code and give us confidence that we've ferreted out all places where we end up implicitly depending on something that will possibly be different across python versions/environments? I think that something like -R or PYTHONHASHSEED that applied to numbers as well as to str, bytes, and datetime objects could work, but I'm open to other approaches. I would however like our automated test machine to need only a single python version installed, if possible.

另一种可接受的替代方法是使用经过调整的 pypy 来运行我们的代码,以便在从 dict 中迭代项目时使用不同的顺序;我认为我们的代码在 pypy 上运行,尽管我们从未明确支持过它.但是,如果某个 pypy 专家为我们提供了一种在不同运行中调整字典迭代顺序的方法,我们将努力实现这一目标.

Another acceptable alternative would be some way to run our code with pypy tweaked so as to use a different order when iterating items out of a dict; I think our code runs on pypy, though it's not something we've ever explicitly supported. However, if some pypy expert gives us a way to tweak dictionary iteration order on different runs, it's something we'll work towards.

推荐答案

在这里使用 PyPy 并不是最佳选择,因为它始终保留其 dicts 中的插入顺序(使用一种使 dicts 使用 less 内存).我们当然可以改变字典的枚举顺序,但它无法解决问题.

Using PyPy is not the best choice here, given that it always retain the insertion order in its dicts (with a method that makes dicts use less memory). We can of course make it change the order dicts are enumerated, but it defeats the point.

相反,我建议修改 CPython 源代码以更改哈希在 dictobject.c 中的使用方式.例如,在每个 hash = PyObject_Hash(key); 之后if (hash == -1) { ..error.. }; 您可以添加 hash ^= HASH_TWEAK; 并使用 HASH_TWEAK<的不同值编译不同版本的 CPython/代码>.(我曾经做过这样的事情,但现在找不到了.需要注意哈希值是原始值还是修改后的值.)

Instead, I'd suggest to hack at the CPython source code to change the way the hash is used inside dictobject.c. For example, after each hash = PyObject_Hash(key); if (hash == -1) { ..error.. }; you could add hash ^= HASH_TWEAK; and compile different versions of CPython with different values for HASH_TWEAK. (I did such a thing at one point, but I can't find it any more. You need to be a bit careful about where the hash values are the original ones or the modified ones.)

这篇关于等效于影响整数散列的python的-R选项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆