将Unicode文本写入文本文件? [英] Writing Unicode text to a text file?

查看:166
本文介绍了将Unicode文本写入文本文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从Google文档中提取数据,处理它,并将其写入文件(最终我将粘贴到Wordpress页面)。



它有一些非ASCII符号。如何将这些安全地转换为可以在HTML源代码中使用的符号?



目前我正在将所有内容转换为Unicode,在Python字符串中加入所有内容,然后执行:

  import codecs 
f = codecs.open('out.txt',mode =w,encoding =iso-8859-1)
f.write(all_html.encode(iso-8859-1,replace))

最后一行有一个编码错误:


UnicodeDecodeError:'ascii'编解码器无法解码字节0xa0的位置
12286:序数不在范围内(128)


部分解决方案: b

这个Python运行没有错误:

  row = [unicode(x.strip如果x不是无其他u''for row in row] 
all_html = row [0] +< br /> + row [1]
f = open('out.txt','w')
f.write(all_html.encode(utf-8)

但是如果我打开实际的文本文件,我看到很多像这样的符号:

 Qur,Äôan

也许我需要写信给别的比起文本文件?

解决方案

首先获取unicode对象时,通过解码unicode对象



如果你的字符串实际上是一个unicode对象,你需要在写入之前将它转换为一个unicode编码的字符串对象它到一个文件:

  foo =u'Δ,É,ק,和$。
f = open('test','w')
f.write(foo.encode('utf8'))
f.close / code>

当你再次读取该文件时,你会得到一个unicode编码的字符串,你可以解码为unicode对象: / p>

  f = file('test','r')
print f.read()。decode('utf8 ')


I'm pulling data out of a Google doc, processing it, and writing it to a file (that eventually I will paste into a Wordpress page).

It has some non-ASCII symbols. How can I convert these safely to symbols that can be used in HTML source?

Currently I'm converting everything to Unicode on the way in, joining it all together in a Python string, then doing:

import codecs
f = codecs.open('out.txt', mode="w", encoding="iso-8859-1")
f.write(all_html.encode("iso-8859-1", "replace"))

There is an encoding error on the last line:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 12286: ordinal not in range(128)

Partial solution:

This Python runs without an error:

row = [unicode(x.strip()) if x is not None else u'' for x in row]
all_html = row[0] + "<br/>" + row[1]
f = open('out.txt', 'w')
f.write(all_html.encode("utf-8")

But then if I open the actual text file, I see lots of symbols like:

Qur’an 

Maybe I need to write to something other than a text file?

解决方案

Deal exclusively with unicode objects as much as possible by decoding things to unicode objects when you first get them and encoding them as necessary on the way out.

If your string is actually a unicode object, you'll need to convert it to a unicode-encoded string object before writing it to a file:

foo = u'Δ, Й, ק, ‎ م, ๗, あ, 叶, 葉, and 말.'
f = open('test', 'w')
f.write(foo.encode('utf8'))
f.close()

When you read that file again, you'll get a unicode-encoded string that you can decode to a unicode object:

f = file('test', 'r')
print f.read().decode('utf8')

这篇关于将Unicode文本写入文本文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆