UnicodeEncodeError:"charmap"编解码器无法在位置0编码字符"\ x80":字符映射到< undefined> [英] UnicodeEncodeError : 'charmap' codec can't encode character '\x80' in position 0 : character maps to <undefined>

查看:56
本文介绍了UnicodeEncodeError:"charmap"编解码器无法在位置0编码字符"\ x80":字符映射到< undefined>的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字符串,该字符串会由我的IDE(非常古老的Boa构造函数)自动转换为字节码.现在,我想将其转换为unicode,以便在特定计算机(Windows上为cp1252或Linux上为utf-8)上以编码方式进行打印.

我使用两种不同的方式.其中一个正在工作,另一个在不工作.但是为什么呢?

这是工作版本:

 #!/usr/bin/python#vim:设置fileencoding = cp1252:str ='\ x80'str = str.decode('cp1252')#转换为unicodestr = str.encode('cp1252')#到str打印str 

这是不起作用的版本:

 #!/usr/bin/python#vim:设置fileencoding = cp1252:str = u'\ x80'#str = str.decode('cp1252')#转换为unicodestr = str.encode('cp1252')#到str打印str 

在版本1中,我通过解码功能将str转换为unicode.在版本2中,我通过字符串前面的u将str转换为unicode.但是我认为,这两个版本会完全一样吗?

解决方案

str.decode 不仅仅是在字符串文字前加上 u .它将输入字符串的字节转换为有意义的字符(即Unicode).

然后您要调用 encode 将此字符转换为字节,因为您需要打印",将其输出到终端或任何其他OS实体(如GUI窗口).

所以,关于您的特定任务,我相信您需要类似的东西:

  s ='\ x80'打印s.decode('cp1251').encode(platform_encoding) 

其中'cp1251'是您的IDE的编码,而 platform_encoding 是具有当前系统编码的变量.


在对您的评论的回复中:

但是str.decode应该已经使用了源代码编码(来自文件中的第2行)进行解码.因此,u

这是错误的假设.来自定义Python源代码编码

然后,Python解析器使用编码信息来使用给定的编码来解释文件.

因此 set fileencoding = cp1252 只是告诉解释器在解析行 str ='\ x80'时,如何将[您通过编辑器输入的]字符转换为字节.调用 str.decode 时不会使用此信息.

您也在问,u'\ x80'是什么? \ x80 被简单地解释为 \ u0080 ,这显然不是您想要的.看看这个问题- unicode Python字符串中的字节.

I have a string which is automatically converted to byte code by my IDE (very old Boa Constructor). Now I want to convert it to unicode in order to print it with the encoding on the specific machine (cp1252 on windows or utf-8 on Linux).

I use two different ways. One of them is working the other one is not working. But why?

Here the working version:

#!/usr/bin/python
# vim: set fileencoding=cp1252 :

str = '\x80'
str = str.decode('cp1252') # to unicode
str = str.encode('cp1252') # to str
print str

Here the not working version:

#!/usr/bin/python
# vim: set fileencoding=cp1252 :

str = u'\x80'
#str = str.decode('cp1252') # to unicode
str = str.encode('cp1252') # to str
print str

In version 1 I convert the str to unicode via the decode function. In version 2 I convert the str to unicode via the u in front of the string. But I thought, the two versions would do exactly the same?

解决方案

str.decode is not just prepending u to the string literal. It translates bytes of input string to meaningful characters (i.e. Unicode).

Then you are calling encode to convert this characters to bytes, since you need to "print", output them to the terminal or any other OS entity (like GUI window).

So, about your specific task, I believe you want something like:

s = '\x80'
print s.decode('cp1251').encode(platform_encoding)

where 'cp1251' is encoding of your IDE, and platform_encoding is a variable with encoding of current system.


In the reply to your comment:

But the str.decode should have used the source code encoding (from line 2 in the file) to decode. So there should not be a difference to the u

This is incorrect assumption. From Defining Python Source Code Encodings

The encoding information is then used by the Python parser to interpret the file using the given encoding.

So set fileencoding=cp1252 just tells the interpreter how to convert characters [you entered via editor] to bytes when it parses line str = '\x80'. This information is not used during str.decode calls.

Also you are asking, what u'\x80' is? \x80 is simply interpretered as \u0080, and this is obviously not what you want. Take a look on this question - Bytes in a unicode Python string.

这篇关于UnicodeEncodeError:"charmap"编解码器无法在位置0编码字符"\ x80":字符映射到< undefined>的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆