Python从文件中读取并保存到utf-8 [英] Python reading from a file and saving to utf-8

查看:112
本文介绍了Python从文件中读取并保存到utf-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在读取文件、处理其字符串和保存到 UTF-8 文件时遇到问题.

I'm having problems reading from a file, processing its string and saving to an UTF-8 File.

代码如下:

try:
    filehandle = open(filename,"r")
except:
    print("Could not open file " + filename)
    quit() 

text = filehandle.read()
filehandle.close()

然后我对可变文本进行一些处理.

I then do some processing on the variable text.

然后

try:
    writer = open(output,"w")
except:
    print("Could not open file " + output)
    quit() 

#data = text.decode("iso 8859-15")    
#writer.write(data.encode("UTF-8"))
writer.write(text)
writer.close()

这完美地输出了文件,但根据我的编辑器的说法,它是在 iso 8859-15 中这样做的.由于同一个编辑器将输入文件(在变量文件名中)识别为 UTF-8,我不知道为什么会发生这种情况.据我的研究表明,注释行应该可以解决问题.但是,当我使用这些行时,生成的文件主要是特殊字符中的乱码,带有波浪号的单词因为文本是西班牙语.我真的很感激任何帮助,因为我很难过......

This output the file perfectly but it does so in iso 8859-15 according to my editor. Since the same editor recognizes the input file (in the variable filename) as UTF-8 I don't know why this happened. As far as my reasearch has shown the commented lines should solve the problem. However when I use those lines the resulting file has gibberish in special character mainly, words with tilde as the text is in spanish. I would really appreciate any help as I am stumped....

推荐答案

使用 openencoding 在程序的 I/O 边界处处理与 Unicode 之间的文本> 参数.确保使用正在读取的文件的(希望有记录的)编码.默认编码因操作系统而异(特别是 locale.getpreferredencoding(False) 是使用的编码),因此我建议始终明确使用 encoding 参数以实现便携性和清晰度(Python下面的 3 个语法):

Process text to and from Unicode at the I/O boundaries of your program using open with the encoding parameter. Make sure to use the (hopefully documented) encoding of the file being read. The default encoding varies by OS (specifically, locale.getpreferredencoding(False) is the encoding used), so I recommend always explicitly using the encoding parameter for portability and clarity (Python 3 syntax below):

with open(filename, 'r', encoding='utf8') as f:
    text = f.read()

# process Unicode text

with open(filename, 'w', encoding='utf8') as f:
    f.write(text)

如果仍在使用 Python 2 或为了与 Python 2/3 兼容,io 模块实现了 open,其语义与 Python 3 的 open 相同并且存在于两个版本中:

If still using Python 2 or for Python 2/3 compatibility, the io module implements open with the same semantics as Python 3's open and exists in both versions:

import io
with io.open(filename, 'r', encoding='utf8') as f:
    text = f.read()

# process Unicode text

with io.open(filename, 'w', encoding='utf8') as f:
    f.write(text)

这篇关于Python从文件中读取并保存到utf-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆