Python:紧凑且可逆地将大整数编码为具有可变或固定长度的base64或base16 [英] Python: Compactly and reversibly encode large integer as base64 or base16 having variable or fixed length

查看:191
本文介绍了Python:紧凑且可逆地将大整数编码为具有可变或固定长度的base64或base16的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将具有任意位数的大无符号或有符号整数紧凑地编码为base64,base32或base16(十六进制)表示形式。输出最终将用作字符串,并将用作文件名,但这应该在重点之外。我正在使用最新的Python 3。

I want to compactly encode a large unsigned or signed integer having an arbitrary number of bits into a base64, base32, or base16 (hexadecimal) representation. The output will ultimately be used as a string which will be used as a filename, but this should be beside the point. I am using the latest Python 3.

此方法有效,但远非紧凑型:

This works but is far from compact:

>>> import base64, sys
>>> i: int = 2**62 - 3  # Can be signed or unsigned.
>>> b64: bytes =  base64.b64encode(str(i).encode()) # Not a compact encoding.
>>> len(b64), sys.getsizeof(b64)
(28, 61)

有是先前的问题,现已关闭,答案完全与效率低下的表示有关。再次注意,在此练习中,我们不想使用任何字符串或不必要的长字节序列。因此,这个问题不是该问题的重复。

There is a prior question, now closed, the answers for which strictly concern with inefficient representations. Note again that we don't want to use any strings or needlessly long sequences of bytes in this exercise. As such, this question is not a duplicate of that question.

推荐答案

此答案部分是由于Erik A的不同评论而引起的。 ,例如答案。首先将整数紧凑地转换为字节,然后将字节编码为变量 base

This answer is motivated in part by disparate comments by Erik A., such as for this answer. The integer is first compactly converted to bytes, following which the bytes are encoded to a variable base.

from typing import Callable, Optional
import base64

class IntBaseEncoder:
    """Reversibly encode an unsigned or signed integer into a customizable encoding of a variable or fixed length."""
    # Ref: https://stackoverflow.com/a/54152763/
    def __init__(self, encoding: str, *, bits: Optional[int] = None, signed: bool = False):
        """
        :param encoder: Name of encoding from base64 module, e.g. b64, urlsafe_b64, b32, b16, etc.
        :param bits: Max bit length of int which is to be encoded. If specified, the encoding is of a fixed length,
        otherwise of a variable length.
        :param signed: If True, integers are considered signed, otherwise unsigned.
        """
        self._decoder: Callable[[bytes], bytes] = getattr(base64, f'{encoding}decode')
        self._encoder: Callable[[bytes], bytes] = getattr(base64, f'{encoding}encode')
        self.signed: bool = signed
        self.bytes_length: Optional[int] = bits and self._bytes_length(2 ** bits - 1)

    def _bytes_length(self, i: int) -> int:
        return (i.bit_length() + 7 + self.signed) // 8

    def encode(self, i: int) -> bytes:
        length = self.bytes_length or self._bytes_length(i)
        i_bytes = i.to_bytes(length, byteorder='big', signed=self.signed)
        return self._encoder(i_bytes)

    def decode(self, b64: bytes) -> int:
        i_bytes = self._decoder(b64)
        return int.from_bytes(i_bytes, byteorder='big', signed=self.signed)

# Tests:
import unittest

class TestIntBaseEncoder(unittest.TestCase):

    ENCODINGS = ('b85', 'b64', 'urlsafe_b64', 'b32', 'b16')

    def test_unsigned_with_variable_length(self):
        for encoding in self.ENCODINGS:
            encoder = IntBaseEncoder(encoding)
            previous_length = 0
            for i in range(1234):
                encoded = encoder.encode(i)
                self.assertGreaterEqual(len(encoded), previous_length)
                self.assertEqual(i, encoder.decode(encoded))

    def test_signed_with_variable_length(self):
        for encoding in self.ENCODINGS:
            encoder = IntBaseEncoder(encoding, signed=True)
            previous_length = 0
            for i in range(-1234, 1234):
                encoded = encoder.encode(i)
                self.assertGreaterEqual(len(encoded), previous_length)
                self.assertEqual(i, encoder.decode(encoded))

    def test_unsigned_with_fixed_length(self):
        for encoding in self.ENCODINGS:
            for maxint in range(257):
                encoder = IntBaseEncoder(encoding, bits=maxint.bit_length())
                maxlen = len(encoder.encode(maxint))
                for i in range(maxint + 1):
                    encoded = encoder.encode(i)
                    self.assertEqual(len(encoded), maxlen)
                    self.assertEqual(i, encoder.decode(encoded))

    def test_signed_with_fixed_length(self):
        for encoding in self.ENCODINGS:
            for maxint in range(257):
                encoder = IntBaseEncoder(encoding, bits=maxint.bit_length(), signed=True)
                maxlen = len(encoder.encode(maxint))
                for i in range(-maxint, maxint + 1):
                    encoded = encoder.encode(i)
                    self.assertEqual(len(encoded), maxlen)
                    self.assertEqual(i, encoder.decode(encoded))

if __name__ == '__main__':
    unittest.main()

如果将输出用作文件名,请使用编码 'urlsafe_b64' 甚至'b16'是更安全的选择

If using the output as a filename, initializing the encoder with the encoding 'urlsafe_b64' or even 'b16' are safer choices.

用法示例:

# Variable length encoding
>>> encoder = IntBaseEncoder('urlsafe_b64')
>>> encoder.encode(12345)
b'MDk='
>>> encoder.decode(_)
12345

# Fixed length encoding
>>> encoder = IntBaseEncoder('b16', bits=32)
>>> encoder.encode(12345)
b'00003039'
>>> encoder.encode(123456789)
b'075BCD15'
>>> encoder.decode(_)
123456789

# Signed
encoder = IntBaseEncoder('b32', signed=True)
encoder.encode(-12345)
b'Z7DQ===='
encoder.decode(_)
-12345

这篇关于Python:紧凑且可逆地将大整数编码为具有可变或固定长度的base64或base16的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆