如何在 Python 3 中的字节和字符串之间进行转换? [英] How to convert between bytes and strings in Python 3?

查看:39
本文介绍了如何在 Python 3 中的字节和字符串之间进行转换?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个 Python 101 类型的问题,但是当我尝试使用一个似乎将我的字符串输入转换为字节的包时,它让我困惑了一段时间.

This is a Python 101 type question, but it had me baffled for a while when I tried to use a package that seemed to convert my string input into bytes.

正如您将在下面看到的,我为自己找到了答案,但我觉得值得在这里记录,因为我花了很多时间来发掘正在发生的事情.它似乎对 Python 3 是通用的,所以我没有提到我正在使用的原始包;它似乎不是一个错误(只是特定的包有一个 .tostring() 方法,显然 not 产生我理解的字符串......)

As you will see below I found the answer for myself, but I felt it was worth recording here because of the time it took me to unearth what was going on. It seems to be generic to Python 3, so I have not referred to the original package I was playing with; it does not seem to be an error (just that the particular package had a .tostring() method that was clearly not producing what I understood as a string...)

我的测试程序是这样的:

My test program goes like this:

import mangler                                 # spoof package

stringThing = """
<Doc>
    <Greeting>Hello World</Greeting>
    <Greeting>你好</Greeting>
</Doc>
"""

# print out the input
print('This is the string input:')
print(stringThing)

# now make the string into bytes
bytesThing = mangler.tostring(stringThing)    # pseudo-code again

# now print it out
print('
This is the bytes output:')
print(bytesThing)

此代码的输出如下:

This is the string input:

<Doc>
    <Greeting>Hello World</Greeting>
    <Greeting>你好</Greeting>
</Doc>


This is the bytes output:
b'
<Doc>
    <Greeting>Hello World</Greeting>
    <Greeting>xe4xbdxa0xe5xa5xbd</Greeting>
</Doc>
'

因此,需要能够在字节和字符串之间进行转换,以避免最终将非 ascii 字符转换为 gobbledegook.

So, there is a need to be able to convert between bytes and strings, to avoid ending up with non-ascii characters being turned into gobbledegook.

推荐答案

上面代码示例中的 'mangler' 做了这样的等效:

The 'mangler' in the above code sample was doing the equivalent of this:

bytesThing = stringThing.encode(encoding='UTF-8')

还有其他的写法(特别是使用bytes(stringThing, encoding='UTF-8'),但是上面的语法让发生的事情以及要做什么一目了然恢复字符串:

There are other ways to write this (notably using bytes(stringThing, encoding='UTF-8'), but the above syntax makes it obvious what is going on, and also what to do to recover the string:

newStringThing = bytesThing.decode(encoding='UTF-8')

当我们这样做时,原始字符串被恢复.

When we do this, the original string is recovered.

请注意,使用 str(bytesThing) 只会转录所有 gobbledegook,而不会将其转换回 Unicode,除非您特别要求使用 UTF-8,即 str(bytesThing, encoding='UTF-8').不指定编码不报错.

Note, using str(bytesThing) just transcribes all the gobbledegook without converting it back into Unicode, unless you specifically request UTF-8, viz., str(bytesThing, encoding='UTF-8'). No error is reported if the encoding is not specified.

这篇关于如何在 Python 3 中的字节和字符串之间进行转换?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆