如何打开一个ascii编码的文件为UTF8? [英] How to open an ascii-encoded file as UTF8?

查看:297
本文介绍了如何打开一个ascii编码的文件为UTF8?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的文件使用US-ASCII,并且像a = file( 'main.html')a.read()这样的命令将它们加载为ASCII文本.如何将其加载为UTF8?

My files are in US-ASCII and a command like a = file( 'main.html') and a.read() loads them as an ASCII text. How do I get it to load as UTF8?

我要解决的问题是:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 38: ordinal not in range(128)

我正在像template_str.format(attrib=val)中那样使用文件内容进行模板化.但是要插入的字符串是ASCII的超集.

I was using the content of the files for templating as in template_str.format(attrib=val). But the string to interpolate is of a superset of ASCII.

我们团队的版本控制和文本编辑器并不关心编码.那么如何在代码中处理它呢?

Our team's version control and text editors does not care about the encoding. So how do I handle it in the code?

推荐答案

您正试图在不指定编码的情况下打开文件,这意味着python使用默认值(ASCII).

You are trying to opening files without specifying an encoding, which means that python uses the default value (ASCII).

您需要使用.decode()函数显式解码字节字符串:

You need to decode the byte-string explicitly, using the .decode() function:

 template_str = template_str.decode('utf8')

您尝试插入模板中的val变量本身就是一个unicode值,而python也想将字节字符串模板(从文件中读取)也自动转换为一个unicode值,以便它可以将两者结合起来,并将使用默认编码.

Your val variable you tried to interpolate into your template is itself a unicode value, and python wants to automatically convert your byte-string template (read from the file) into a unicode value too, so that it can combine both, and it'll use the default encoding to do so.

我是否已经提到您应该阅读Joel Spolsky的关于Unicode的文章 Python Unicode HOWTO ?他们会帮助您了解这里发生的情况.

Did I mention already you should read Joel Spolsky's article on Unicode and the Python Unicode HOWTO? They'll help you understand what happened here.

这篇关于如何打开一个ascii编码的文件为UTF8?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆