(unicode错误)"unicodeescape"编解码器无法解码字节-带有'\ u'的字符串 [英] (unicode error) 'unicodeescape' codec can't decode bytes - string with '\u'

查看:576
本文介绍了(unicode错误)"unicodeescape"编解码器无法解码字节-带有'\ u'的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编写我的Python 2.6代码,但考虑到Python 3,我认为放个好主意

Writing my code for Python 2.6, but with Python 3 in mind, I thought it was a good idea to put

from __future__ import unicode_literals

在某些模块的顶部.换句话说,我正在寻求麻烦(以免在将来避免麻烦),但是我可能在这里缺少一些重要的知识.我希望能够传递表示文件路径的字符串并实例化一个像

at the top of some modules. In other words, I am asking for troubles (to avoid them in the future), but I might be missing some important knowledge here. I want to be able to pass a string representing a filepath and instantiate an object as simple as

MyObject('H:\unittests')

Python 2.6 中,这很好用,即使对于以'\u..'开头的目录,也不需要使用双反斜杠或原始字符串,这正是我想要的.在__init__方法中,我确保将所有单个\出现都解释为'\\',包括在\a\b\f\n\r中特殊字符之前的出现,\t\v(仅\x仍然是问题).同样,使用(本地)编码将给定的字符串解码为unicode也可以正常工作.

In Python 2.6, this works just fine, no need to use double backslashes or a raw string, even for a directory starting with '\u..', which is exactly what I want. In the __init__ method I make sure all single \ occurences are interpreted as '\\', including those before special characters as in \a, \b, \f,\n, \r, \t and \v (only \x remains a problem). Also decoding the given string into unicode using (local) encoding works as expected.

Python 3.x 做准备,在编辑器中模拟我的实际问题(从Python 2.6中的干净控制台开始),发生以下情况:

Preparing for Python 3.x, simulating my actual problem in an editor (starting with a clean console in Python 2.6), the following happens:

>>> '\u'
'\\u'
>>> r'\u'
'\\u'

(确定,直到此处:'\u'由控制台使用本地编码进行编码)

(OK until here: '\u' is encoded by the console using the local encoding)

>>> from __future__ import unicode_literals
>>> '\u'
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: end of string in escape sequence

换句话说,(unicode)字符串根本不会解释为unicode,也不会使用本地编码自动解码.即使是原始字符串,也是如此:

In other words, the (unicode) string is not interpreted as unicode at all, nor does it get decoded automatically with the local encoding. Even so for a raw string:

>>> r'\u'
SyntaxError: (unicode error) 'rawunicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX

u'\u'相同:

>>> u'\u'
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: end of string in escape sequence

此外,我希望isinstance(str(''), unicode)返回True(不是),因为导入unicode_literals应该使所有字符串类型都成为unicode. (edit:)因为在Python 3中,所以所有字符串都是Unicode字符序列,我希望str(''))返回这样的unicode字符串,而type(str(''))既是<type 'unicode'>,又是<type 'str'>(因为所有字符串都是unicode),但也意识到<type 'unicode'> is not <type 'str'>.到处都是混乱...

Also, I would expect isinstance(str(''), unicode) to return True (which it does not), because importing unicode_literals should make all string-types unicode. (edit:) Because in Python 3, all strings are sequences of Unicode characters, I would expect str('')) to return such a unicode-string, and type(str('')) to be both <type 'unicode'>, and <type 'str'> (because all strings are unicode) but also realise that <type 'unicode'> is not <type 'str'>. Confusion all around...

问题

  • 如何最好地传递包含'\u'的字符串? (不写"\\u")
  • from __future__ import unicode_literals确实实现了所有与Python 3相关的unicode更改,以便获得完整的Python 3字符串环境吗?
  • how can I best pass strings containing '\u'? (without writing '\\u')
  • does from __future__ import unicode_literals really implement all Python 3. related unicode changes so that I get a complete Python 3 string environment?

在Python 3中, > <type 'str'>是Unicode对象 <type 'unicode'>根本不存在.在我的情况下,我想为Python 2(.6)编写可在Python 3中运行的代码.但是当我import unicode_literals时,由于以下原因,我无法检查字符串是否为<type 'unicode'>:

edit: In Python 3, <type 'str'> is a Unicode object and <type 'unicode'> simply does not exist. In my case I want to write code for Python 2(.6) that will work in Python 3. But when I import unicode_literals, I cannot check if a string is of <type 'unicode'> because:

  • 我认为unicode不是名称空间的一部分
  • 如果unicode是命名空间的一部分,则<type 'str'>的文字在同一模块中创建时仍是unicode
  • type(mystring)将始终为Python 3中的unicode文字返回<type 'str'>.
  • I assume unicode is not part of the namespace
  • if unicode is part of the namespace, a literal of <type 'str'> is still unicode when it is created in the same module
  • type(mystring) will always return <type 'str'> for unicode literals in Python 3

我的模块以前在顶部用# coding: UTF-8注释以'utf-8'编码,而我的locale.getdefaultlocale()[1]返回'cp1252'.因此,如果我从控制台调用MyObject('çça'),则在Python 2中将其编码为"cp1252",而从模块中调用MyObject('çça')时,其编码为"utf-8".在Python 3中,它将不会被编码,而是一个Unicode文字.

My modules use to be encoded in 'utf-8' by a # coding: UTF-8 comment at the top, while my locale.getdefaultlocale()[1] returns 'cp1252'. So if I call MyObject('çça') from my console, it is encoded as 'cp1252' in Python 2, and in 'utf-8' when calling MyObject('çça') from the module. In Python 3, it will not be encoded, but a unicode literal.

我放弃希望避免在u(或x)之前使用'\'的希望.我也了解导入unicode_literals的局限性.但是,将字符串从模块传递到控制台,以及通过不同的编码反之亦然的许多可能组合,再加上是否导入unicode_literals以及Python 2 vs Python 3,使我想通过以下方式创建概述实际测试.因此,下表.

I gave up hope about being allowed to avoid using '\' before a u (or x for that matter). Also I understand the limitations of importing unicode_literals. However, the many possible combinations of passing a string from a module to the console and vica versa with each different encoding, and on top of that importing unicode_literals or not and Python 2 vs Python 3, made me want to create an overview by actual testing. Hence the table below.

换句话说,type(str(''))在Python 3中不返回<type 'str'>,但是在<class 'str'>中返回,并且似乎可以避免所有Python 2问题.

In other words, type(str('')) does not return <type 'str'> in Python 3, but <class 'str'>, and all of Python 2 problems seem to be avoided.

推荐答案

AFAIK,from __future__ import unicode_literals所做的只是使所有字符串文字为unicode类型,而不是字符串类型.那就是:

AFAIK, all that from __future__ import unicode_literals does is to make all string literals of unicode type, instead of string type. That is:

>>> type('')
<type 'str'>
>>> from __future__ import unicode_literals
>>> type('')
<type 'unicode'>

但是strunicode仍然是不同的类型,它们的行为就像以前一样.

But str and unicode are still different types, and they behave just like before.

>>> type(str(''))
<type 'str'>

总是str类型.

关于您的r'\u'问题,这是设计使然,因为它等同于不带unicode_literals的ru'\ u'.从文档中:

About your r'\u' issue, it is by design, as it is equivalent to ru'\u' without unicode_literals. From the docs:

当将'r'或'R'前缀与'u'或'U'前缀一起使用时,将处理\ uXXXX和\ UXXXXXXXX转义序列,同时在字符串中保留所有其他反斜杠./p>

When an 'r' or 'R' prefix is used in conjunction with a 'u' or 'U' prefix, then the \uXXXX and \UXXXXXXXX escape sequences are processed while all other backslashes are left in the string.

可能是从词法分析器在python2系列中的工作方式开始的.在python3中,它按您(和我)的预期工作.

Probably from the way the lexical analyzer worked in the python2 series. In python3 it works as you (and I) would expect.

您可以输入两次反斜杠,然后\u不会被解释,但是您将得到两个反斜杠!

You can type the backslash twice, and then the \u will not be interpreted, but you'll get two backslashes!

反斜杠可以与前面的反斜杠一起转义;但是,它们都保留在字符串中

Backslashes can be escaped with a preceding backslash; however, both remain in the string

>>> ur'\\u'
u'\\\\u'

恕我直言,您有两个简单的选择:

So IMHO, you have two simple options:

  • 请勿使用原始字符串,并转义反斜杠(与python3兼容):

  • Do not use raw strings, and escape your backslashes (compatible with python3):

'H:\\unittests'

要太聪明,并要利用unicode代码点(与python3不兼容):

Be too smart and take advantage of unicode codepoints (not compatible with python3):

r'H:\u005cunittests'

这篇关于(unicode错误)"unicodeescape"编解码器无法解码字节-带有'\ u'的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆