将可变大小的字节数组转换为整数/长整数 [英] Convert variable-sized byte array to a integer/long

查看:19
本文介绍了将可变大小的字节数组转换为整数/长整数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将(大端)可变大小的二进制字节数组转换为(无符号)整数/长整数?以'\x11\x34'为例,代表4404

How can I convert a (big endian) variable-sized binary byte array to an (unsigned) integer/long? As an example, '\x11\x34', which represents 4404

现在,我正在使用

def bytes_to_int(bytes):
  return int(bytes.encode('hex'), 16)

它很小而且可读性很强,但可能效率不高.有没有更好(更明显)的方法?

Which is small and somewhat readable, but probably not very efficient. Is there a better (more obvious) way?

推荐答案

Python 传统上并没有太多使用大端 C 布局中的数字"对于 C 来说太大了.(如果你正在处理 2-byte、4 字节或 8 字节的数字,那么 struct.unpack 就是答案.)

Python doesn't traditionally have much use for "numbers in big-endian C layout" that are too big for C. (If you're dealing with 2-byte, 4-byte, or 8-byte numbers, then struct.unpack is the answer.)

但是有足够多的人厌倦了没有一种明显的方法来做到这一点,以至于 Python 3.2 添加了一个方法 int.from_bytes 完全符合您的要求:

But enough people got sick of there not being one obvious way to do this that Python 3.2 added a method int.from_bytes that does exactly what you want:

int.from_bytes(b, byteorder='big', signed=False)

不幸的是,如果您使用的是旧版本的 Python,则没有此功能.那么,你有哪些选择?(除了显而易见的:更新到 3.2,或者更好的是 3.4……)

Unfortunately, if you're using an older version of Python, you don't have this. So, what options do you have? (Besides the obvious one: update to 3.2, or, better, 3.4…)

首先,这是您的代码.我认为 binascii.hexlify 是比 .encode('hex') 更好的拼写方式,因为编码"对于字节方法来说总是有点奇怪字符串(与 Unicode 字符串相反),实际上它在 Python 3 中已被排除.但除此之外,它对我来说似乎非常易读且显而易见.它应该非常快——是的,它必须创建一个中间字符串,但它在 C 中执行所有循环和算术(至少在 CPython 中),这通常比在 Python 中快一两个数量级.除非您的 bytearray 太大以至于分配字符串本身成本很高,否则我不会担心这里的性能.

First, there's your code. I think binascii.hexlify is a better way to spell it than .encode('hex'), because "encode" has always seemed a little weird for a method on byte strings (as opposed to Unicode strings), and it's in fact been banished in Python 3. But otherwise, it seems pretty readable and obvious to me. And it should be pretty fast—yes, it has to create an intermediate string, but it's doing all the looping and arithmetic in C (at least in CPython), which is generally an order of magnitude or two faster than in Python. Unless your bytearray is so big that allocating the string will itself be costly, I wouldn't worry about performance here.

或者,您可以循环进行.但这会更冗长,至少在 CPython 中,速度会慢很多.

Alternatively, you could do it in a loop. But that's going to be more verbose and, at least in CPython, a lot slower.

您可以尝试消除隐式循环的显式循环,但明显的功能是reduce,社区的一部分认为这是非Pythonic的——当然它会继续要求为每个字节调用一个函数.

You could try to eliminate the explicit loop for an implicit one, but the obvious function to do that is reduce, which is considered un-Pythonic by part of the community—and of course it's going to require calling a function for each byte.

您可以展开循环或reduce,方法是将其分成 8 个字节的块并循环遍历 struct.unpack_from,或者只执行一个大的 struct.unpack('Q'*len(b)//8 + 'B' * len(b)%8) 并循环遍历它,但这使得它的可读性大大降低,并且可能没有那么快.

You could unroll the loop or reduce by breaking it into chunks of 8 bytes and looping over struct.unpack_from, or by just doing a big struct.unpack('Q'*len(b)//8 + 'B' * len(b)%8) and looping over that, but that makes it a lot less readable and probably not that much faster.

您可以使用 NumPy……但如果您要大于 64 位或 128 位,无论如何它最终都会将所有内容转换为 Python 对象.

You could use NumPy… but if you're going bigger than either 64 or maybe 128 bits, it's going to end up converting everything to Python objects anyway.

所以,我认为你的回答是最好的选择.

So, I think your answer is the best option.

以下是将其与最明显的手动转换进行比较的一些时间:

Here are some timings comparing it to the most obvious manual conversion:

import binascii
import functools
import numpy as np

def hexint(b):
    return int(binascii.hexlify(b), 16)

def loop1(b):
    def f(x, y): return (x<<8)|y
    return functools.reduce(f, b, 0)

def loop2(b):
    x = 0
    for c in b:
        x <<= 8
        x |= c
    return x

def numpily(b):
    n = np.array(list(b))
    p = 1 << np.arange(len(b)-1, -1, -1, dtype=object)
    return np.sum(n * p)

<小时>

In [226]: b = bytearray(range(256))

In [227]: %timeit hexint(b)
1000000 loops, best of 3: 1.8 µs per loop

In [228]: %timeit loop1(b)
10000 loops, best of 3: 57.7 µs per loop

In [229]: %timeit loop2(b)
10000 loops, best of 3: 46.4 µs per loop

In [283]: %timeit numpily(b)
10000 loops, best of 3: 88.5 µs per loop

为了在 Python 3.4 中进行比较:

For comparison in Python 3.4:

In [17]: %timeit hexint(b)
1000000 loops, best of 3: 1.69 µs per loop

In [17]: %timeit int.from_bytes(b, byteorder='big', signed=False)
1000000 loops, best of 3: 1.42 µs per loop

所以,你的方法仍然相当快......

So, your method is still pretty fast…

这篇关于将可变大小的字节数组转换为整数/长整数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆