Python从文件读取并保存到utf-8 [英] Python reading from a file and saving to utf-8

查看:1005
本文介绍了Python从文件读取并保存到utf-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从文件读取,处理其字符串并将其保存到UTF-8文件时遇到问题.

I'm having problems reading from a file, processing its string and saving to an UTF-8 File.

这是代码:

try:
    filehandle = open(filename,"r")
except:
    print("Could not open file " + filename)
    quit() 

text = filehandle.read()
filehandle.close()

然后我对可变文本进行一些处理.

I then do some processing on the variable text.

然后

try:
    writer = open(output,"w")
except:
    print("Could not open file " + output)
    quit() 

#data = text.decode("iso 8859-15")    
#writer.write(data.encode("UTF-8"))
writer.write(text)
writer.close()

这完美地输出了文件,但是根据我的编辑器,它在iso 8859-15中完成了输出.由于相同的编辑器将输入文件(在变量文件名中)识别为UTF-8,所以我不知道为什么会这样.据我的研究表明,注释行应该可以解决问题.但是,当我使用这些行时,生成的文件主要具有特殊字符的乱码,带有波浪号的单词作为文本是西班牙语.当我陷入困境时,我将不胜感激....

This output the file perfectly but it does so in iso 8859-15 according to my editor. Since the same editor recognizes the input file (in the variable filename) as UTF-8 I don't know why this happened. As far as my reasearch has shown the commented lines should solve the problem. However when I use those lines the resulting file has gibberish in special character mainly, words with tilde as the text is in spanish. I would really appreciate any help as I am stumped....

推荐答案

使用codecs模块在程序的I/O边界与Unicode之间来回处理文本:

Process text to and from Unicode at the I/O boundaries of your program using the codecs module:

import codecs
with codecs.open(filename, 'r', encoding='utf8') as f:
    text = f.read()
# process Unicode text
with codecs.open(filename, 'w', encoding='utf8') as f:
    f.write(text)

现在建议使用io模块而不是编解码器,并且该模块与Python 3的open语法兼容,如果使用Python 3,则不使用它就可以使用open不需要Python 2兼容性.

The io module is now recommended instead of codecs and is compatible with Python 3's open syntax, and if using Python 3, you can just use open if you don't require Python 2 compatibility.

import io
with io.open(filename, 'r', encoding='utf8') as f:
    text = f.read()
# process Unicode text
with io.open(filename, 'w', encoding='utf8') as f:
    f.write(text)

这篇关于Python从文件读取并保存到utf-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆