如何读取多种已知的文件编码 [英] How to read multiple known file encodings

查看：43 发布时间：2021/4/21 20:27:05 python python-3.x character-encoding

本文介绍了如何读取多种已知的文件编码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直在网上寻找一种解决方案，以解决读取具有不同编码格式的文件的问题，并且我发现了许多无法分辨文件编码是什么"的实例(因此，如果有人正在阅读此文件并具有链接，我会很感激).但是，我要处理的问题比打开任何文件编码"要集中得多，而要打开一组已知的编码.我绝不是这个主题的专家，但我想我应该发布解决方案，以防有人遇到此问题.

I've been searching the web for a solution to address reading files with different encodings and I've found many instances of "it's impossible to tell what encoding a file is" (so if anyone is reading this and has a link I would appreciate it). However, the problem I was dealing with was a bit more focused than "open any file encoding" but rather open a set of known encodings. I am by no means an expert at this topic but I thought I would post my solution in case anyone ran into this issue.

具体示例:

已知的文件编码:utf8和Windows ansi

Known file encodings: utf8, and windows ansi

初始问题:据我所知，未为python的 open('file'，'r')命令指定编码自动默认为encoding ='utf8'，这在运行时引发UnicodeDecodeError尝试 f.readline() ansi文件.对此的常见搜索是:"UnicodeDecodeError:'utf-8'编解码器无法解码字节"

Initial Issue: as I now know, not specifying a encoding to python's open('file', 'r') command auto defaults to encoding='utf8' That raised a UnicodeDecodeError at runtime when trying to f.readline() a ansi file. A common search on this is: "UnicodeDecodeError: 'utf-8' codec can't decode byte"

次要问题:所以我想很好，很简单，我们知道正在引发的异常，因此请读一行，如果它引发此UnicodeDecodeError，则关闭文件并使用 open('file'，重新打开它，'r'，encoding ='ansi').这样做的问题是，有时utf8能够很好地读取ansi编码文件的前几行，但随后却无法读取.现在解决方案变得清晰了.我必须用utf8读取整个文件，如果失败了，那我就知道该文件是ansi.

Secondary Issue: so then I thought okay, well simple enough, we know the exception that's being raised so read a line and if it raises this UnicodeDecodeError then close the file and reopen it with open('file', 'r', encoding='ansi'). The problem with this was that sometimes utf8 was able to read the first few lines of an ansi encoded file just fine but then failed on a later line. Now the solution became clear; I had to read through the entire file with utf8 and if it failed then I knew that this file was a ansi.

我将以此为答，但如果有人有更好的解决方案，我也将不胜感激:)

I'll post my take on this as an answer but if someone has a better solution, I would also appreciate that :)

如何读取多种已知的文件编码 [英] How to read multiple known file encodings

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何读取多种已知的文件编码 [英] How to read multiple known file encodings

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭