读取具有未知编码的非ASCII字符的文本文件 [英] Read a text file with non-ASCII characters in an unknown encoding
本文介绍了读取具有未知编码的非ASCII字符的文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想要读取一个还包含德语的文件,而不仅仅是字符。我发现我可以这样做
I want to read a file that contains also German and not only characters. I found that i can do like this
>>> import codecs
>>> file = codecs.open('file.txt','r', encoding='UTF-8')
>>> lines= file.readlines()
当我尝试在Python IDLE中运行我的工作时,这是正常工作当我尝试从别的地方运行它不会给出正确的结果。有一个想法?
This is working when i try to run my job in Python IDLE but when i try to run it from somewhere else does not give correct result. Have a idea?
推荐答案
你需要知道编码文本的哪个字符,如果你不知道您可以尝试使用 chardet 模块进行猜测。首先安装它:
You need to know which character encoding the text is encoded in. If you don't know that beforehand, you can try guessing it with the chardet module. First install it:
$ pip install chardet
然后,例如以二进制模式读取文件:
Then, for example reading the file in binary mode:
>>> import chardet
>>> chardet.detect(open("file.txt", "rb").read())
{'confidence': 0.9690625, 'encoding': 'utf-8'}
所以,然后:
>>> import unicodedata
>>> lines = codecs.open('file.txt', 'r', encoding='utf-8').readlines()
这篇关于读取具有未知编码的非ASCII字符的文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文