读取具有未知编码的非ASCII字符的文本文件 [英] Read a text file with non-ASCII characters in an unknown encoding

查看:599
本文介绍了读取具有未知编码的非ASCII字符的文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要读取一个还包含德语的文件,而不仅仅是字符。我发现我可以这样做

I want to read a file that contains also German and not only characters. I found that i can do like this

  >>> import codecs
  >>> file = codecs.open('file.txt','r', encoding='UTF-8')
  >>> lines= file.readlines()

当我尝试在Python IDLE中运行我的工作时,这是正常工作当我尝试从别的地方运行它不会给出正确的结果。有一个想法?

This is working when i try to run my job in Python IDLE but when i try to run it from somewhere else does not give correct result. Have a idea?

推荐答案

你需要知道编码文本的哪个字符,如果你不知道您可以尝试使用 chardet 模块进行猜测。首先安装它:

You need to know which character encoding the text is encoded in. If you don't know that beforehand, you can try guessing it with the chardet module. First install it:

$ pip install chardet

然后,例如以二进制模式读取文件:

Then, for example reading the file in binary mode:

>>> import chardet
>>> chardet.detect(open("file.txt", "rb").read())
{'confidence': 0.9690625, 'encoding': 'utf-8'}

所以,然后:

>>> import unicodedata
>>> lines = codecs.open('file.txt', 'r', encoding='utf-8').readlines()

这篇关于读取具有未知编码的非ASCII字符的文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆