在 Python 中序列化二进制数据 [英] Serializing binary data in Python
问题描述
我有一些二进制数据,它们以字节字符串数组的形式存在于 Python 中.
I have some binary data which is in Python in the form of an array of byte strings.
是否有一种可移植的方式来序列化其他语言可以读取的数据?
Is there a portable way to serialize this data that other languages could read?
JSON 丢失是因为我刚刚发现它没有真正的方法来存储二进制数据;它的字符串应该是 Unicode.
JSON loses because I just found out that it has no real way to store binary data; its strings are expected to be Unicode.
我不想使用 pickle
,因为我不想要安全风险,这限制了它在其他 Python 程序中的使用.
I don't want to use pickle
because I don't want the security risk, and that limits its use to other Python programs.
有什么建议吗?我真的很想使用内置库(或者至少是标准 Anaconda 发行版的一部分).
Any advice? I would really like to use a builtin library (or at least one that's part of the standard Anaconda distribution).
推荐答案
如果你只需要字符串中的二进制数据并且可以很容易地恢复单个字符串之间的边界,你可以直接将它们写入文件,作为 raw字符串.
If you just need the binary data in the strings and can recover the boundaries between the individual strings easily, you could just write them to a file directly, as raw strings.
如果您无法轻松恢复字符串边界,JSON 似乎是一个不错的选择:
If you can't recover the string boundaries easily, JSON seems like a good option:
a = [b"abcxf3x9cxc6", b"xyz"]
serialised = json.dumps([s.decode("latin1") for s in a])
print [s.encode("latin1") for s in json.loads(serialised)]
将打印
['abcxf3x9cxc6', 'xyz']
这里的技巧是任意二进制字符串都是有效的latin1
,所以它们总是可以被解码为Unicode并再次编码回原始字符串.
The trick here is that arbitrary binary strings are valid latin1
, so they can always be decoded to Unicode and encoded back to the original string again.
这篇关于在 Python 中序列化二进制数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!