两个多字节二进制数据变量之间最快的按位异或 [英] Fastest bitwise xor between two multibyte binary data variables

查看:127
本文介绍了两个多字节二进制数据变量之间最快的按位异或的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

采用以下逻辑最快的实现方式是什么:

def xor(data, key):
    l = len(key)

    buff = ""
    for i in range(0, len(data)):
        buff += chr(ord(data[i]) ^ ord(key[i % l]))
    return buff

在我的情况下, key 是20字节的sha1摘要,而 data 是一些20字节至几(1,2,3)兆字节长的二进制数据

更新:

好的,伙计们.这是一种快3.5倍的实现,它将数据和密钥分成4、2或1个字节的块(在我的情况下,大多数情况下是4个字节长的整数):

def xor(data, key):
    index = len(data) % 4
    size = (4, 1, 2, 1)[index]
    type = ('L', 'B', 'H', 'B')[index]
    key_len = len(key)/size
    data_len = len(data)/size
    key_fmt = "<" + str(key_len) + type;
    data_fmt = "<" + str(data_len) + type;

    key_list = struct.unpack(key_fmt, key)
    data_list = struct.unpack(data_fmt, data)

    result = []
    for i in range(data_len):
        result.append (key_list[i % key_len] ^ data_list[i])

    return struct.pack(data_fmt, *result)

占用大量内存,但就我而言,这没什么大不了的.

任何想法如何将速度再提高几次? :-)

最终更新:

好的,好的... numpy完成了这项工作.真是太快了:

def xor(data, key):
    import numpy, math

    # key multiplication in order to match the data length
    key = (key*int(math.ceil(float(len(data))/float(len(key)))))[:len(data)]

    # Select the type size in bytes       
    for i in (8,4,2,1):
        if not len(data) % i: break

    if i == 8: dt = numpy.dtype('<Q8');
    elif i == 4: dt = numpy.dtype('<L4');
    elif i == 2: dt = numpy.dtype('<H2');
    else: dt = numpy.dtype('B');

    return numpy.bitwise_xor(numpy.fromstring(key, dtype=dt), numpy.fromstring(data, dtype=dt)).tostring()

最初的实现需要8分钟50秒来处理一个千兆字节,第二个过程大约需要2分钟30秒,而最后一个恰好是0分钟10秒.

感谢所有贡献思想和代码的人.你们真棒!

解决方案

免责声明:正如其他张贴者所说,这是一种加密文件的非常糟糕的方法. 本文演示了如何轻松逆转这种混淆.

首先,一个简单的异或算法:

def xor(a,b,_xor8k=lambda a,b:struct.pack("!1000Q",*map(operator.xor,
                    struct.unpack("!1000Q",a),
                    struct.unpack("!1000Q",b)))
        ):
    if len(a)<=8000:
        s="!%iQ%iB"%divmod(len(a),8)
        return struct.pack(s,*map(operator.xor,
            struct.unpack(s,a),
            struct.unpack(s,b)))
    a=bytearray(a)
    for i in range(8000,len(a),8000):
        a[i-8000:i]=_xor8k(
            a[i-8000:i],
            b[i-8000:i])
    a[i:]=xor(a[i:],b[i:])
    return str(a)

第二个环绕异或算法:

def xor_wrap(data,key,_struct8k=struct.Struct("!1000Q")):
    l=len(key)
    if len(data)>=8000:
        keyrpt=key*((7999+2*l)//l)#this buffer is accessed with whatever offset is required for a given 8k block
        #this expression should create at most 1 more copy of the key than is needed
        data=bytearray(data)
        offset=-8000#initial offset, set to zero on first loop iteration
        modulo=0#offset used to access the repeated key
        for offset in range(0,len(data)-7999,8000):
            _struct8k.pack_into(data,offset,*map(operator.xor,
                _struct8k.unpack_from(data,offset),
                _struct8k.unpack_from(keyrpt,modulo)))
            modulo+=8000;modulo%=l
        offset+=8000
    else:offset=0;keyrpt=key*(len(data)//l+1)#simple calculation guaranteed to be enough
    rest=len(data)-offset
    srest=struct.Struct("!%iQ%iB"%divmod(len(data)-offset,8))
    srest.pack_into(data,offset,*map(operator.xor,
        srest.unpack_from(data,offset),
        srest.unpack_from(keyrpt,modulo)))
    return data

What is the fastest way to implementat the following logic:

def xor(data, key):
    l = len(key)

    buff = ""
    for i in range(0, len(data)):
        buff += chr(ord(data[i]) ^ ord(key[i % l]))
    return buff

In my case key is 20-byte sha1 digest, and data is some binary data between 20 bytes and few (1, 2, 3) megabytes long

UPDATE:

OK guys. Here's a 3.5 times faster implementation, which splits data and key by chunks of 4, 2 or 1 bytes (in my case, most of the time it's 4-byte long integer):

def xor(data, key):
    index = len(data) % 4
    size = (4, 1, 2, 1)[index]
    type = ('L', 'B', 'H', 'B')[index]
    key_len = len(key)/size
    data_len = len(data)/size
    key_fmt = "<" + str(key_len) + type;
    data_fmt = "<" + str(data_len) + type;

    key_list = struct.unpack(key_fmt, key)
    data_list = struct.unpack(data_fmt, data)

    result = []
    for i in range(data_len):
        result.append (key_list[i % key_len] ^ data_list[i])

    return struct.pack(data_fmt, *result)

Uses a lot of memory, but in my case it's not a big deal.

Any ideas how to increase the speed few more times? :-)

FINAL UPDATE:

OK, ok... numpy did the job. That's just blazing fast:

def xor(data, key):
    import numpy, math

    # key multiplication in order to match the data length
    key = (key*int(math.ceil(float(len(data))/float(len(key)))))[:len(data)]

    # Select the type size in bytes       
    for i in (8,4,2,1):
        if not len(data) % i: break

    if i == 8: dt = numpy.dtype('<Q8');
    elif i == 4: dt = numpy.dtype('<L4');
    elif i == 2: dt = numpy.dtype('<H2');
    else: dt = numpy.dtype('B');

    return numpy.bitwise_xor(numpy.fromstring(key, dtype=dt), numpy.fromstring(data, dtype=dt)).tostring()

Initial implementation needed 8min 50sec to process a gigabyte, the second - around 2min 30sec and the last one just.... 0min 10sec.

Thanks to anyone who contributed ideas and code. You're great guys!

解决方案

Disclaimer:As other posters have said, this is a really bad way to encrypt files. This article demonstrates how to reverse this kind of obfuscation trivially.

first, a simple xor algorithm:

def xor(a,b,_xor8k=lambda a,b:struct.pack("!1000Q",*map(operator.xor,
                    struct.unpack("!1000Q",a),
                    struct.unpack("!1000Q",b)))
        ):
    if len(a)<=8000:
        s="!%iQ%iB"%divmod(len(a),8)
        return struct.pack(s,*map(operator.xor,
            struct.unpack(s,a),
            struct.unpack(s,b)))
    a=bytearray(a)
    for i in range(8000,len(a),8000):
        a[i-8000:i]=_xor8k(
            a[i-8000:i],
            b[i-8000:i])
    a[i:]=xor(a[i:],b[i:])
    return str(a)

secondly the wrapping xor algorithm:

def xor_wrap(data,key,_struct8k=struct.Struct("!1000Q")):
    l=len(key)
    if len(data)>=8000:
        keyrpt=key*((7999+2*l)//l)#this buffer is accessed with whatever offset is required for a given 8k block
        #this expression should create at most 1 more copy of the key than is needed
        data=bytearray(data)
        offset=-8000#initial offset, set to zero on first loop iteration
        modulo=0#offset used to access the repeated key
        for offset in range(0,len(data)-7999,8000):
            _struct8k.pack_into(data,offset,*map(operator.xor,
                _struct8k.unpack_from(data,offset),
                _struct8k.unpack_from(keyrpt,modulo)))
            modulo+=8000;modulo%=l
        offset+=8000
    else:offset=0;keyrpt=key*(len(data)//l+1)#simple calculation guaranteed to be enough
    rest=len(data)-offset
    srest=struct.Struct("!%iQ%iB"%divmod(len(data)-offset,8))
    srest.pack_into(data,offset,*map(operator.xor,
        srest.unpack_from(data,offset),
        srest.unpack_from(keyrpt,modulo)))
    return data

这篇关于两个多字节二进制数据变量之间最快的按位异或的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆