如何将字符串转换为字节而不进行编码 [英] How to cast a string to bytes without encoding

查看:162
本文介绍了如何将字符串转换为字节而不进行编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一堆二进制数据通过char *从某个C接口(不在我的控制下)通过char *到达python,所以我有一串任意的二进制数据(通常是字节数组)。我想将其转换为字节数组,以简化与其他python函数的结合使用,但我似乎不知道如何操作。

I have a bunch of binary data that comes to python via a char* from some C interface (not under my control) so I have a string of arbitrary binary data (what is normally a byte array). I would like to convert it to a byte array to simplify using it with other python functions but I can't seem to figure out how.

不起作用的示例:

data = rawdatastr.encode()这假定 utf-8并处理数据==不良

data = rawdatastr.encode() this assumes "utf-8" and mangles the data == BAD

data = rawdatastr.encode('ascii','ignore')去除超过127个字符==坏

data = rawdatastr.encode('ascii','ignore') strips chars over 127 == BAD

data = rawdatastr.encode('latin1')不确定-这是迄今为止最接近的参数

data = rawdatastr.encode('latin1') not sure -- this is the closest so far but I have no proof that it is working for all bytes.

data = array.array('B',[x for map in x(ord, data)])。tobytes()可行,但似乎要做很多简单的工作。有什么更简单的东西吗?

data = array.array('B', [x for x in map(ord,data)]).tobytes() This works but seems like a lot of work to do something simple. Is there something simpler?

我想我需要编写自己的身份编码,该编码只是将字节传递(我认为 latin1这样做了)

I am thinking I need to write my own identity encoding that just passes the bytes along (I think latin1 does this based upon some reading but no proof thus far).

推荐答案

尽管我怀疑还有其他东西正在为您解码数据(< C中的code> char * 通常最好用 bytes 表示,尤其是二进制数据时):

Though I suspect something else is decoding your data for you (a char* in C is usually best represented as bytes, especially if it is binary data):

latin1 编解码器可以使每个字节往返。您可以使用以下简短程序对此进行验证:

The latin1 codec can round trip every byte. You can verify this with the following short program:

>>> s = ''.join(chr(i) for i in range(0x100))
>>> s
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0¡¢£¤¥¦§¨©ª«¬\xad®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ'
>>> s2 = s.encode('latin1').decode('latin1')
>>> s2 == s
True
>>> sb = bytes(range(0x100))
>>> sb
b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>> sb == s.encode('latin1')
True

这篇关于如何将字符串转换为字节而不进行编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆