比较两个python文件是否产生相同的字节码(代码明智地相同) [英] compare whether two python files result in same byte code (are code wise identical)

查看:77
本文介绍了比较两个python文件是否产生相同的字节码(代码明智地相同)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在清理一些代码.清理只是关于格式化(如果有问题,那么我们甚至可以假设行号没有改变,尽管理想情况下我也希望忽略行号的改变)

We're doing some code cleanup. The cleanup is only about formatting (if an issue, then let's even assume, that line numbers don't change, though ideally I'd like to ignore also line number changes)

为了确保没有意外的代码更改,我想找到一种简单/快速的方法来比较两个源代码.

In order to be sure, that there is no accidental code change I'd like to find a simple / fast way to compare the two source codes.

因此,假设我有 file1.py file2.py

正在使用的是什么 py_compile.compile(filename)创建.pyc文件,然后使用 uncompyle6 pycfile ,然后删除注释并比较结果,但这是矫over过正,而且非常缓慢.

what is working is to use py_compile.compile(filename) to create .pyc files and then use uncompyle6 pycfile, then strip off comments and compare the results, But this is overkill and very slow.

我想象的另一种方法是复制将 file1.py 例如更改为 file.py ,使用 py_compile.compile("file.py")并保存.pyc文件

Another approach I imagined is to copy file1.py for example to file.py, use py_compile.compile("file.py") and save the .pyc file

然后将例如 file2.py 复制到 file.py 并使用使用 py_compile.compile("file.py")并保存.pyc文件最后比较两个生成的.pyc文件

then copy file2.py for example to file.py and use use py_compile.compile("file.py") and save the .pyc file and finally compare both generated .pyc files

在所有(当前)版本> = python 3.6上都能可靠地工作吗

Would this work reliably with all (current) versions >= python 3.6

如果我至少记得python2,那么pyc文件可能包含时间戳或绝对路径,这可能会使比较失败.(至少如果pyc文件的生成是在两台不同的计算机上运行的)

If I remember at least for python2 the pyc files could contain time stamps or absolute paths, that could make the comparison fail. (at least if the generation of the pyc file was run on two different machines)

是否有一种比较简单的方法来比较py2文件的字节码?

Is there a clean way to compare the byte code of py2 files?

作为奖励功能(如果可能),我想为每个字节代码创建一个哈希,可以存储以供将来参考.

As bonus feature (if possible) I'd like to create a hash for each byte code, that I could store for future reference.

推荐答案

您可以尝试使用Python内部的

You might try using Python's internal compile function, which can compile from string (read in from a file in your case). For example, compiling and comparing the resulting code objects from two equivalent programs and one almost equivalent program and then just for demo purposes (something you would not want to do) executing a couple of the code objects:

import hashlib
import marshal
​
​
def compute_hash(code):
    code_bytes = marshal.dumps(code)
    code_hash = hashlib.sha1(code_bytes).hexdigest()
    return code_hash
​
​
source1 = """x = 3
y = 4
z = x * y
print(z)
"""
source2 = "x=3;y=4;z=x*y;print(z)"
​
source3 = "a=3;y=4;z=a*y;print(z)"
​
obj1 = compile(source=source1, filename='<string>', mode='exec', dont_inherit=1)
obj2 = compile(source=source2, filename='<string>', mode='exec', dont_inherit=1)
obj3 = compile(source=source3, filename='<string>', mode='exec', dont_inherit=1)
​
print(obj1 == obj2)
print(obj1 == obj3)
​
exec(obj1)
exec(obj3)
print(compute_hash(obj1))

打印:

True
False
12
12
48632a1b64357e9d09d19e765d3dc6863ee67ab9

这将使您不必复制py文件,创建pyc文件,比较pyc文件等.

注意:

compute_hash 函数是如果您需要可重复的哈希函数,即在连续的程序运行中计算出的相同代码对象重复返回相同的值.

The compute_hash function is if you need a hash function that is repeatable, i.e. returns the same value repeatedly for the same code object when computed in successive program runs.

这篇关于比较两个python文件是否产生相同的字节码(代码明智地相同)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆