Python,将输出编码为UTF-8 [英] Python, Encoding output to UTF-8

查看:355
本文介绍了Python,将输出编码为UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个定义,它构建一个由UTF-8编码的字符组成的字符串。输出文件使用'w +',utf-8参数打开。



但是,当我尝试 x.write(string)我得到 UnicodeEncodeError :'ascii'编码解码器无法在位置1中编码字符u'\feffff':ordinal not in range(128)



I假设这是因为通常例如你会做打印(你的')'。但是,我需要使用一个变量,并在u' _ 中的引号否定...



任何建议?



编辑:这里的实际代码:

  source = codecs.open(actionbreak / + target +'.csv','r',utf-8)
outTarget = codecs.open(actionbreak /+ newTarget,'w +',utf-8)
x = str(actionT(splitList [0],splitList [1]))
outTarget.write(x)

本质上所有这一切都应该是在建立大量与此类似的字符串:



[日木曜可交付] = CASE WHEN东西= 11
THEN C ELSE 0 END

解决方案

您使用 codecs.open() ? Python 2.7的内置 open()不支持特定的编码,这意味着您必须手动编码非ascii字符串(如其他人已注意到),但 codecs.open()确实支持,可能比手动编码所有字符串更容易。






由于您实际使用的是 codecs.open(),通过添加的代码,我建议尝试用utf-8-sig打开输入和/或输出文件,这将自动处理UTF-8的BOM(请参阅 http://docs.python.org/2/library/codecs.html#编码和unicode ,附近的部分)我认为这只对输入文件很重要,但是如果没有这些组合(utf-8-sig / utf-8,utf-8 / utf-8-sig,utf-8-sig / utf-8-sig)工作,那么我认为最可能的情况是您的输入文件使用BOM以不同的Unicode格式进行编码,因为Python的默认UTF-8编解码器将BOM解释为常规字符,因此输入不会有问题,但输出可能。



只要注意到这一点,但是...当你使用 codecs.open()时,它需要一个Unicode字符串,不是编码的尝试 x = unicode(actionT(splitList [0],splitList [1]))



您的错误可以也会在尝试解码unicode字符串时发生(请参阅 http://wiki.python.org/moin/UnicodeEncodeError ),但我不认为应该发生,除非 actionT()或您的列表分解对Unicode字符串做某事,导致它们被视为非Unicode字符串。


I have a definition that builds a string composed of UTF-8 encoded characters. The output files are opened using 'w+', "utf-8" arguments.

However, when I try to x.write(string) I get the UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 1: ordinal not in range(128)

I assume this is because normally for example you would do `print(u'something'). But I need to use a variable and the quotations in u'_' negate that...

Any suggestions?

EDIT: Actual code here:

source = codecs.open("actionbreak/" + target + '.csv','r', "utf-8")
outTarget = codecs.open("actionbreak/" + newTarget, 'w+', "utf-8")
x = str(actionT(splitList[0], splitList[1]))
outTarget.write(x)

Essentially all this is supposed to be doing is building me a large amount of strings that look similar to this:

[日木曜 Deliverables]= CASE WHEN things = 11 THEN C ELSE 0 END

解决方案

Are you using codecs.open()? Python 2.7's built-in open() does not support a specific encoding, meaning you have to manually encode non-ascii strings (as others have noted), but codecs.open() does support that and would probably be easier to drop in than manually encoding all the strings.


As you are actually using codecs.open(), going by your added code, and after a bit of looking things up myself, I suggest attempting to open the input and/or output file with encoding "utf-8-sig", which will automatically handle the BOM for UTF-8 (see http://docs.python.org/2/library/codecs.html#encodings-and-unicode, near the bottom of the section) I would think that would only matter for the input file, but if none of those combinations (utf-8-sig/utf-8, utf-8/utf-8-sig, utf-8-sig/utf-8-sig) work, then I believe the most likely situation would be that your input file is encoded in a different Unicode format with BOM, as Python's default UTF-8 codec interprets BOMs as regular characters so the input would not have an issue but output could.


Just noticed this, but... when you use codecs.open(), it expects a Unicode string, not an encoded one; try x = unicode(actionT(splitList[0], splitList[1])).

Your error can also occur when attempting to decode a unicode string (see http://wiki.python.org/moin/UnicodeEncodeError), but I don't think that should be happening unless actionT() or your list-splitting does something to the Unicode strings that causes them to be treated as non-Unicode strings.

这篇关于Python,将输出编码为UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆