序列化Python中的二进制数据 [英] Serializing binary data in Python
问题描述
我有一些二进制数据是用Python字节字符串数组的形式。
I have some binary data which is in Python in the form of an array of byte strings.
有序列化这个数据,其他语言能读懂?一个可移植的方式
Is there a portable way to serialize this data that other languages could read?
JSON失去,因为我刚刚发现它具有存储二进制数据没有真正的方法;其字符串预计将统一code。
JSON loses because I just found out that it has no real way to store binary data; its strings are expected to be Unicode.
我不希望使用酱菜
,因为我不想要的安全风险,并限制其使用其他Python程序。
I don't want to use pickle
because I don't want the security risk, and that limits its use to other Python programs.
有何建议?我真的想用一个内置库(或至少有一个是这样的标准蟒蛇分布的一部分)。
Any advice? I would really like to use a builtin library (or at least one that's part of the standard Anaconda distribution).
推荐答案
如果你只需要在字符串中的二进制数据,可以轻松地恢复个体线之间的界限,你可以只把它们写到一个文件直接作为原料字符串。
If you just need the binary data in the strings and can recover the boundaries between the individual strings easily, you could just write them to a file directly, as raw strings.
如果你不能轻易收回字符串界限,JSON似乎是一个不错的选择:
If you can't recover the string boundaries easily, JSON seems like a good option:
a = [b"abc\xf3\x9c\xc6", b"xyz"]
serialised = json.dumps([s.decode("latin1") for s in a])
print [s.encode("latin1") for s in json.loads(serialised)]
将打印
['abc\xf3\x9c\xc6', 'xyz']
这里的窍门是任意的二进制字符串是有效的 LATIN1
,这样他们就可以永远去coded到统一code和EN codeD回到原来的字符串一次。
The trick here is that arbitrary binary strings are valid latin1
, so they can always be decoded to Unicode and encoded back to the original string again.
这篇关于序列化Python中的二进制数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!