将可变大小的字节数组转换为整数/长整数 [英] Convert variable-sized byte array to a integer/long

查看:302
本文介绍了将可变大小的字节数组转换为整数/长整数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将(大端)可变大小的二进制字节数组转换为(无符号)整数/长?例如,'\ x11 \ x34',代表4404

How can I convert a (big endian) variable-sized binary byte array to an (unsigned) integer/long? As an example, '\x11\x34', which represents 4404

现在,我是使用

def bytes_to_int(bytes):
  return int(bytes.encode('hex'), 16)

这是小而有些可读,但可能效率不高。是否有更好的(更明显的)方式?

Which is small and somewhat readable, but probably not very efficient. Is there a better (more obvious) way?

推荐答案

Python传统上对big-endian中的数字没有多大用处C布局对于C来说太大了(如果你处理的是2字节,4字节或8字节的数字,那么 struct.unpack 就是回答。)

Python doesn't traditionally have much use for "numbers in big-endian C layout" that are too big for C. (If you're dealing with 2-byte, 4-byte, or 8-byte numbers, then struct.unpack is the answer.)

但是有足够多的人厌倦了没有一种明显的方法可以做到这一点,Python 3.2添加了一个方法 int.from_bytes 完全符合您的要求:

But enough people got sick of there not being one obvious way to do this that Python 3.2 added a method int.from_bytes that does exactly what you want:

int.from_bytes(b, byteorder='big', signed=False)

不幸的是,如果您使用的是旧版本的Python,那么您就没有这个。那么,你有什么选择? (除了明显的一个:更新到3.2,或者更好,3.4 ......)

Unfortunately, if you're using an older version of Python, you don't have this. So, what options do you have? (Besides the obvious one: update to 3.2, or, better, 3.4…)

首先,这是你的代码。我认为 binascii.hexlify 拼写比 .encode('hex')更好,因为编码 对于字节字符串上的方法(与Unicode字符串相对),似乎总是有点奇怪,而且它实际上已经在Python 3中被放逐了。但是,对我来说,它看起来很可读。它应该非常快 - 是的,它必须创建一个中间字符串,但它在C中进行所有循环和算术(至少在CPython中),这通常比Python中快一个数量级或两个数量级。除非你的 bytearray 太大以至于分配字符串本身会很昂贵,所以我不担心这里的性能。

First, there's your code. I think binascii.hexlify is a better way to spell it than .encode('hex'), because "encode" has always seemed a little weird for a method on byte strings (as opposed to Unicode strings), and it's in fact been banished in Python 3. But otherwise, it seems pretty readable and obvious to me. And it should be pretty fast—yes, it has to create an intermediate string, but it's doing all the looping and arithmetic in C (at least in CPython), which is generally an order of magnitude or two faster than in Python. Unless your bytearray is so big that allocating the string will itself be costly, I wouldn't worry about performance here.

或者,您可以循环执行。但是这会更加冗长,至少在CPython中会慢得多。

Alternatively, you could do it in a loop. But that's going to be more verbose and, at least in CPython, a lot slower.

你可以尝试消除隐式循环的显式循环,但显而易见的功能这样做是 reduce ,它被社区的一部分视为非Pythonic - 当然它需要为每个字节调用一个函数。

You could try to eliminate the explicit loop for an implicit one, but the obvious function to do that is reduce, which is considered un-Pythonic by part of the community—and of course it's going to require calling a function for each byte.

您可以通过将循环分成8个字节的块并循环遍历 struct.unpack_from来展开循环或 reduce ,或者只做一个大的 struct.unpack('Q'* len(b)// 8 +'B'* len(b)%8)并循环播放,但这使得它的可读性低得多,而且速度可能不会那么快。

You could unroll the loop or reduce by breaking it into chunks of 8 bytes and looping over struct.unpack_from, or by just doing a big struct.unpack('Q'*len(b)//8 + 'B' * len(b)%8) and looping over that, but that makes it a lot less readable and probably not that much faster.

你可以使用NumPy ...但如果你要去无论是64位还是128位,它都会最终将所有内容转换为Python对象。

You could use NumPy… but if you're going bigger than either 64 or maybe 128 bits, it's going to end up converting everything to Python objects anyway.

所以,我认为你的答案是最好的选择。

So, I think your answer is the best option.

以下是比较它的一些时间最明显的手动转换:

Here are some timings comparing it to the most obvious manual conversion:

import binascii
import functools
import numpy as np

def hexint(b):
    return int(binascii.hexlify(b), 16)

def loop1(b):
    def f(x, y): return (x<<8)|y
    return functools.reduce(f, b, 0)

def loop2(b):
    x = 0
    for c in b:
        x <<= 8
        x |= c
    return x

def numpily(b):
    n = np.array(list(b))
    p = 1 << np.arange(len(b)-1, -1, -1, dtype=object)
    return np.sum(n * p)







In [226]: b = bytearray(range(256))

In [227]: %timeit hexint(b)
1000000 loops, best of 3: 1.8 µs per loop

In [228]: %timeit loop1(b)
10000 loops, best of 3: 57.7 µs per loop

In [229]: %timeit loop2(b)
10000 loops, best of 3: 46.4 µs per loop

In [283]: %timeit numpily(b)
10000 loops, best of 3: 88.5 µs per loop

在Python 3.4中进行比较:

For comparison in Python 3.4:

In [17]: %timeit hexint(b)
1000000 loops, best of 3: 1.69 µs per loop

In [17]: %timeit int.from_bytes(b, byteorder='big', signed=False)
1000000 loops, best of 3: 1.42 µs per loop

所以,你的方法还很漂亮快......

So, your method is still pretty fast…

这篇关于将可变大小的字节数组转换为整数/长整数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆