将 Unicode 文本写入文本文件? [英] Writing Unicode text to a text file?

查看:34
本文介绍了将 Unicode 文本写入文本文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从 Google 文档中提取数据,对其进行处理,然后将其写入文件(最终我将粘贴到 Wordpress 页面中).

I'm pulling data out of a Google doc, processing it, and writing it to a file (that eventually I will paste into a Wordpress page).

它有一些非 ASCII 符号.如何将这些安全地转换为可在 HTML 源代码中使用的符号?

It has some non-ASCII symbols. How can I convert these safely to symbols that can be used in HTML source?

目前我正在将所有内容转换为 Unicode,将它们全部连接到一个 Python 字符串中,然后执行:

Currently I'm converting everything to Unicode on the way in, joining it all together in a Python string, then doing:

import codecs
f = codecs.open('out.txt', mode="w", encoding="iso-8859-1")
f.write(all_html.encode("iso-8859-1", "replace"))

最后一行有编码错误:

UnicodeDecodeError: 'ascii' 编解码器无法解码字节 0xa0 的位置12286:序号不在范围内(128)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 12286: ordinal not in range(128)

部分解决方案:

这个 Python 运行没有错误:

This Python runs without an error:

row = [unicode(x.strip()) if x is not None else u'' for x in row]
all_html = row[0] + "<br/>" + row[1]
f = open('out.txt', 'w')
f.write(all_html.encode("utf-8"))

但是如果我打开实际的文本文件,我会看到很多符号,例如:

But then if I open the actual text file, I see lots of symbols like:

Qur’an 

也许我需要写入文本文件以外的内容?

Maybe I need to write to something other than a text file?

推荐答案

通过在第一次获取 unicode 对象时将其解码为 un​​icode 对象并在退出时根据需要对其进行编码,从而尽可能多地专门处理 unicode 对象.

Deal exclusively with unicode objects as much as possible by decoding things to unicode objects when you first get them and encoding them as necessary on the way out.

如果您的字符串实际上是一个 unicode 对象,则需要在将其写入文件之前将其转换为 unicode 编码的字符串对象:

If your string is actually a unicode object, you'll need to convert it to a unicode-encoded string object before writing it to a file:

foo = u'Δ, Й, ק, ‎ م, ๗, あ, 叶, 葉, and 말.'
f = open('test', 'w')
f.write(foo.encode('utf8'))
f.close()

当您再次读取该文件时,您将获得一个 unicode 编码的字符串,您可以将其解码为一个 unicode 对象:

When you read that file again, you'll get a unicode-encoded string that you can decode to a unicode object:

f = file('test', 'r')
print f.read().decode('utf8')

这篇关于将 Unicode 文本写入文本文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆