如何在python中解码从文件读取的unicode字符串? [英] How to decode unicode string that is read from a file in Python?
问题描述
我有一个包含UTF-16字符串的文件.当我尝试读取unicode时,""(双引号)被添加,该字符串看起来像"b'\\ xff \\ xfeA \\ x00'"
.内置的 .decode
函数引发 AttributeError:'str'对象没有属性'decode'
.我尝试了一些选项,但这些选项无效.
I have a file containing UTF-16 strings. When I try to read the unicode, " " (double quotes) are added and the string looks like "b'\\xff\\xfeA\\x00'"
. The inbuilt .decode
function throws a AttributeError: 'str' object has no attribute 'decode'
. I tried a few options but those didn't work.
推荐答案
看起来文件是通过向其写入字节文字而创建的,如下所示:
It looks like the file has been created by writing bytes literals to it, something like this:
some_bytes = b'Hello world'
with open('myfile.txt', 'w') as f:
f.write(str(some_bytes))
这可以避免以下事实:尝试向以文本模式打开的文件写入字节会引发错误,但代价是该文件现在包含"b'hello world'"
(注意引号内的"b".
This gets around the fact that attempting write bytes to a file opened in text mode raises an error, but at the cost that the file now contains "b'hello world'"
(note the 'b' inside the quotes).
解决方案是在写入之前将 bytes
解码为 str
:
The solution is to decode the bytes
to str
before writing:
some_bytes = b'Hello world'
my_str = some_bytes.decode('utf-16') # or whatever the encoding of the bytes might be
with open('myfile.txt', 'w') as f:
f.write(my_str)
或以二进制模式打开文件并直接写入字节
or open the file in binary mode and write the bytes directly
some_bytes = b'Hello world'
with open('myfile.txt', 'wb') as f:
f.write(some_bytes)
请注意,如果以文本模式打开文件,则需要提供正确的编码
Note you will need to provide the correct encoding if opening the file in text mode
with open('myfile.txt', encoding='utf-16') as f: # Be sure to use the correct encoding
考虑将运行Python的 -b
或 -bb
标志设置为分别发出警告或异常以检测对字节进行字符串化的尝试.
Consider running Python with the -b
or -bb
flag set to raise a warning or exception respectively to detect attempts to stringify bytes.
这篇关于如何在python中解码从文件读取的unicode字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!