如何知道Python中文件的编码? [英] How to know the encoding of a file in Python?

查看:240
本文介绍了如何知道Python中文件的编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有人知道如何在Python中获取文件的编码。我知道您可以使用编解码器模块打开具有特定编码的文件,但必须提前知道。

 导入编解码器
f = codecs.open(file.txt,r,utf-8)

有没有办法自动检测哪个编码用于文件?



提前感谢



编辑:
感谢大家有趣的答案。您可能还对基于chardet的 http://whatismyencoding.com/ 感兴趣(网站上的更多内容通过瓶子python框架)

解决方案

不幸的是,通过查看文件来确定文件的编码是没有正确的方式本身。这是一个通用问题,不限于python或任何特定的文件系统。



如果您正在阅读XML文件,文件中的第一行可能



否则,您将不得不使用一些基于启发式的方法,如 chardet (其他答案中提供的解决方案之一),试图通过以原始字节格式检查文件中的数据来猜测编码。如果你在Windows上,我相信Windows API还会公开一些方法来尝试根据文件中的数据猜测编码。


Does anybody know how to get the encoding of a file in Python. I know that you can use the codecs module to open a file with a specific encoding but you have to know it in advance.

import codecs
f = codecs.open("file.txt", "r", "utf-8")

Is there a way to detect automatically which encoding is used for a file?

Thanks in advance

Edit: Thanks everybody for very interesting answsers. You may also be interested by http://whatismyencoding.com/ which is based on chardet (more over the site is powered by bottle python framework)

解决方案

Unfortunately there is no 'correct' way to determine the encoding of a file by looking at the file itself. This is a universal problem, not limited to python or any particular file system.

If you're reading an XML file, the first line in the file might give you a hint of what the encoding is.

Otherwise, you will have to use some heuristics-based approach like chardet (one of the solutions given in other answers) which tries to guess the encoding by examining the data in the file in raw byte format. If you're on Windows, I believe the Windows API also exposes methods to try and guess the encoding based on the data in the file.

这篇关于如何知道Python中文件的编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆