Python 3：读取包含德语变音符号的UTF-8文件 [英] Python 3: Read UTF-8 file containing German umlaut

查看：336 发布时间：2020/10/29 6:27:04 python encoding utf-8

本文介绍了Python 3：读取包含德语变音符号的UTF-8文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我搜索并发现了许多类似的问题和文章，但是没有一个问题和文章能让我解决。

I searched and found many similar questions and articles but none would allow me to resolve the issue.

我使用Python 3.5.0（v3.5.0：374f501f4567， 2015年9月13日，02：27：37）[Windows 10上的MSC v.1900 64位（AMD64）]。

I use Python 3.5.0 (v3.5.0:374f501f4567, Sep 13 2015, 02:27:37) [MSC v.1900 64 bit (AMD64)] on Windows 10.

我有一个简单的文本文件，其编码为UTF-8中的Windows如下所示：

I have a simple text file which is encoded for Windows in UTF-8 like so:

我要做的就是阅读内容

这里是第一次尝试，但是失败很惨：

Here is a first attempt that fails miserably:

    file_name=r'c:\temp\encoding_test.txt'
    fh=open(file_name,'r')
    f_str=fh.read()
    fh.close()
    print(f_str)

打印语句引发异常：

'charmap'编解码器c在位置100上不对字符'\u201e'进行编码：字符映射到未定义

'charmap' codec can't encode character '\u201e' in position 100: character maps to undefined

使用调试器，f_str包含以下内容：

Using a debugger, f_str contains the following:

'我希望在将文件读入Python后正确显示以下字符：\n\nÃ„ Ã–ÃœÃ¤¤ Ã¶¼ÃŸ\n'

'I would like the following characters to display correctly after reading this file into Python:\n\nÃ„Ã–ÃœÃ¤Ã¶Ã¼ÃŸ\n'

这已经让我很困惑。 Python 3不会在所有地方都使用UTF-8作为默认值吗？还有什么其他编码可以使用？我尝试了Notepad ++支持的所有功能，但均无效果。

This is already very puzzling to me. Doesn't Python 3 use UTF-8 as a default everywhere? What other encoding would work? I tried all of the ones Notepad++ supports, none works.

好，更加复杂，我尝试了：

OK, a bit more sophisticated, I tried:

    import codecs
    file_name=r'c:\temp\encoding_test.txt'
    my_encoding='utf-8'
    fh=codecs.open(file_name,'r',encoding=my_encoding)
    f_str=fh.read().encode(my_encoding)
    fh.close()
    print(f_str)

这至少不会引发异常，但是会产生收益

This does not raise an exception, at least, but yields

b'将文件读入Python后，我希望以下字符正确显示：\r\n\r\n\xc3\x84\xc3\ x96\xc3\x9c\xc3\xa4\xc3\xb6\xc3\xbc\xc3\x9f\r\n'
I

b'I would like the following characters to display correctly after reading this file into Python:\r\n\r\n\xc3\x84\xc3\x96\xc3\x9c\xc3\xa4\xc3\xb6\xc3\xbc\xc3\x9f\r\n' I

对我来说这真是一团糟。有人可以帮我解决这个问题吗？

This is a complete mess to me. Can anyone here please help me sort this out?

推荐答案

您正在使用 codecs.open 编码为字节打印数据应该给您想要的，就像我们解码回来时所看到的：

You are encoding to bytes after using codecs.open , just printing the data should give you want as you can see when we decode back:

In [31]: s = b'I would like the following characters to display correctly after reading this file into Python:\r\n\r\n\xc3\x84\xc3\x96\xc3\x9c\xc3\xa4\xc3\xb6\xc3\xbc\xc3\x9f\r\n'

In [32]: print(s)
b'I would like the following characters to display correctly after reading this file into Python:\r\n\r\n\xc3\x84\xc3\x96\xc3\x9c\xc3\xa4\xc3\xb6\xc3\xbc\xc3\x9f\r\n'

In [33]: print(s.decode("utf-8"))
I would like the following characters to display correctly after reading this file into Python:

ÄÖÜäöüß

如果您没有看到正确的输出，那么问题就是您的shell编码。 Windows控制台的编码不是utf-8，因此在哪里运行代码以及shell编码都很重要。

If you are not seeing the correct output then it is your shell encoding that is the problem. The windows console encoding is not utf-8 so where you are running the code from and the shell encoding matters.

这篇关于Python 3：读取包含德语变音符号的UTF-8文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python 3：读取包含德语变音符号的UTF-8文件 [英] Python 3: Read UTF-8 file containing German umlaut

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python 3：读取包含德语变音符号的UTF-8文件 [英] Python 3: Read UTF-8 file containing German umlaut

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭