两个多字节二进制数据变量之间最快的按位异或 [英] Fastest bitwise xor between two multibyte binary data variables
问题描述
采用以下逻辑最快的实现方式是什么:
def xor(data, key):
l = len(key)
buff = ""
for i in range(0, len(data)):
buff += chr(ord(data[i]) ^ ord(key[i % l]))
return buff
在我的情况下, key 是20字节的sha1摘要,而 data 是一些20字节至几(1,2,3)兆字节长的二进制数据>
更新:
好的,伙计们.这是一种快3.5倍的实现,它将数据和密钥分成4、2或1个字节的块(在我的情况下,大多数情况下是4个字节长的整数):
def xor(data, key):
index = len(data) % 4
size = (4, 1, 2, 1)[index]
type = ('L', 'B', 'H', 'B')[index]
key_len = len(key)/size
data_len = len(data)/size
key_fmt = "<" + str(key_len) + type;
data_fmt = "<" + str(data_len) + type;
key_list = struct.unpack(key_fmt, key)
data_list = struct.unpack(data_fmt, data)
result = []
for i in range(data_len):
result.append (key_list[i % key_len] ^ data_list[i])
return struct.pack(data_fmt, *result)
占用大量内存,但就我而言,这没什么大不了的.
任何想法如何将速度再提高几次? :-)
最终更新:
好的,好的... numpy完成了这项工作.真是太快了:
def xor(data, key):
import numpy, math
# key multiplication in order to match the data length
key = (key*int(math.ceil(float(len(data))/float(len(key)))))[:len(data)]
# Select the type size in bytes
for i in (8,4,2,1):
if not len(data) % i: break
if i == 8: dt = numpy.dtype('<Q8');
elif i == 4: dt = numpy.dtype('<L4');
elif i == 2: dt = numpy.dtype('<H2');
else: dt = numpy.dtype('B');
return numpy.bitwise_xor(numpy.fromstring(key, dtype=dt), numpy.fromstring(data, dtype=dt)).tostring()
最初的实现需要8分钟50秒来处理一个千兆字节,第二个过程大约需要2分钟30秒,而最后一个恰好是0分钟10秒.
感谢所有贡献思想和代码的人.你们真棒!
免责声明:正如其他张贴者所说,这是一种加密文件的非常糟糕的方法. 本文演示了如何轻松逆转这种混淆.
首先,一个简单的异或算法:
def xor(a,b,_xor8k=lambda a,b:struct.pack("!1000Q",*map(operator.xor,
struct.unpack("!1000Q",a),
struct.unpack("!1000Q",b)))
):
if len(a)<=8000:
s="!%iQ%iB"%divmod(len(a),8)
return struct.pack(s,*map(operator.xor,
struct.unpack(s,a),
struct.unpack(s,b)))
a=bytearray(a)
for i in range(8000,len(a),8000):
a[i-8000:i]=_xor8k(
a[i-8000:i],
b[i-8000:i])
a[i:]=xor(a[i:],b[i:])
return str(a)
第二个环绕异或算法:
def xor_wrap(data,key,_struct8k=struct.Struct("!1000Q")):
l=len(key)
if len(data)>=8000:
keyrpt=key*((7999+2*l)//l)#this buffer is accessed with whatever offset is required for a given 8k block
#this expression should create at most 1 more copy of the key than is needed
data=bytearray(data)
offset=-8000#initial offset, set to zero on first loop iteration
modulo=0#offset used to access the repeated key
for offset in range(0,len(data)-7999,8000):
_struct8k.pack_into(data,offset,*map(operator.xor,
_struct8k.unpack_from(data,offset),
_struct8k.unpack_from(keyrpt,modulo)))
modulo+=8000;modulo%=l
offset+=8000
else:offset=0;keyrpt=key*(len(data)//l+1)#simple calculation guaranteed to be enough
rest=len(data)-offset
srest=struct.Struct("!%iQ%iB"%divmod(len(data)-offset,8))
srest.pack_into(data,offset,*map(operator.xor,
srest.unpack_from(data,offset),
srest.unpack_from(keyrpt,modulo)))
return data
What is the fastest way to implementat the following logic:
def xor(data, key):
l = len(key)
buff = ""
for i in range(0, len(data)):
buff += chr(ord(data[i]) ^ ord(key[i % l]))
return buff
In my case key is 20-byte sha1 digest, and data is some binary data between 20 bytes and few (1, 2, 3) megabytes long
UPDATE:
OK guys. Here's a 3.5 times faster implementation, which splits data and key by chunks of 4, 2 or 1 bytes (in my case, most of the time it's 4-byte long integer):
def xor(data, key):
index = len(data) % 4
size = (4, 1, 2, 1)[index]
type = ('L', 'B', 'H', 'B')[index]
key_len = len(key)/size
data_len = len(data)/size
key_fmt = "<" + str(key_len) + type;
data_fmt = "<" + str(data_len) + type;
key_list = struct.unpack(key_fmt, key)
data_list = struct.unpack(data_fmt, data)
result = []
for i in range(data_len):
result.append (key_list[i % key_len] ^ data_list[i])
return struct.pack(data_fmt, *result)
Uses a lot of memory, but in my case it's not a big deal.
Any ideas how to increase the speed few more times? :-)
FINAL UPDATE:
OK, ok... numpy did the job. That's just blazing fast:
def xor(data, key):
import numpy, math
# key multiplication in order to match the data length
key = (key*int(math.ceil(float(len(data))/float(len(key)))))[:len(data)]
# Select the type size in bytes
for i in (8,4,2,1):
if not len(data) % i: break
if i == 8: dt = numpy.dtype('<Q8');
elif i == 4: dt = numpy.dtype('<L4');
elif i == 2: dt = numpy.dtype('<H2');
else: dt = numpy.dtype('B');
return numpy.bitwise_xor(numpy.fromstring(key, dtype=dt), numpy.fromstring(data, dtype=dt)).tostring()
Initial implementation needed 8min 50sec to process a gigabyte, the second - around 2min 30sec and the last one just.... 0min 10sec.
Thanks to anyone who contributed ideas and code. You're great guys!
Disclaimer:As other posters have said, this is a really bad way to encrypt files. This article demonstrates how to reverse this kind of obfuscation trivially.
first, a simple xor algorithm:
def xor(a,b,_xor8k=lambda a,b:struct.pack("!1000Q",*map(operator.xor,
struct.unpack("!1000Q",a),
struct.unpack("!1000Q",b)))
):
if len(a)<=8000:
s="!%iQ%iB"%divmod(len(a),8)
return struct.pack(s,*map(operator.xor,
struct.unpack(s,a),
struct.unpack(s,b)))
a=bytearray(a)
for i in range(8000,len(a),8000):
a[i-8000:i]=_xor8k(
a[i-8000:i],
b[i-8000:i])
a[i:]=xor(a[i:],b[i:])
return str(a)
secondly the wrapping xor algorithm:
def xor_wrap(data,key,_struct8k=struct.Struct("!1000Q")):
l=len(key)
if len(data)>=8000:
keyrpt=key*((7999+2*l)//l)#this buffer is accessed with whatever offset is required for a given 8k block
#this expression should create at most 1 more copy of the key than is needed
data=bytearray(data)
offset=-8000#initial offset, set to zero on first loop iteration
modulo=0#offset used to access the repeated key
for offset in range(0,len(data)-7999,8000):
_struct8k.pack_into(data,offset,*map(operator.xor,
_struct8k.unpack_from(data,offset),
_struct8k.unpack_from(keyrpt,modulo)))
modulo+=8000;modulo%=l
offset+=8000
else:offset=0;keyrpt=key*(len(data)//l+1)#simple calculation guaranteed to be enough
rest=len(data)-offset
srest=struct.Struct("!%iQ%iB"%divmod(len(data)-offset,8))
srest.pack_into(data,offset,*map(operator.xor,
srest.unpack_from(data,offset),
srest.unpack_from(keyrpt,modulo)))
return data
这篇关于两个多字节二进制数据变量之间最快的按位异或的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!