在Python3中如何进行解码('string-escape')? [英] how do I .decode('string-escape') in Python3?

查看:1115
本文介绍了在Python3中如何进行解码('string-escape')?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些转义的字符串,需要不被转义。我想在Python中这样做。



例如,在python2.7中,我可以这样做:

 >>> \123特殊的.decode('string-escape')
'特殊的东西'
>>>

如何在Python3中执行?这不行:

 >>> b\123特殊的.decode('string-escape')
追溯(最近的最后一次调用):
文件< stdin>,第1行,< module>
LookupError:未知编码:string-escape
>>>

我的目标是要采取一个如下这样的字符串:



$ $ $ $ $ $ 000l\000o\000c\000.\000c\000o\000m\000

并将其变成:

 support@psiloc.com
/ pre>

在进行转换后,我会探测一下我所使用的字符串是否以UTF-8或UTF-16编码。

解决方案

您必须使用 unicode_escape

 >>> b\\123特殊的.decode('unicode_escape')

如果你开始使用 str 对象(相当于python 2.7 unicode),您需要先编码为字节,然后使用 unicode_escape



如果您需要字节作为最终结果,则必须重新编码为合适的编码( .encode('latin1')例如,如果需要保留文字字节值,则前255个unicode代码点映射1对1)



您的示例实际上是具有转义的UTF-16数据。从 unicode_escape 中解码,返回到 latin1 以保留字节,然后从 utf-16- le (UTF 16 little endian without BOM):

 >>> value = b's\\000u\\000p\\\\000p\\000o\\000r\\000t\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ $ \\ $ >>> value.decode('unicode_escape')。encode('latin1')#convert to bytes 
b's\x00u\x00p\x00p\x00o\x00r\x00t\x00 @ \x00p\\ \\ x00s\x00i\x00l\x00o\x00c\x00.\x00c\x00o\x00m\x00'
>>>> _.decode('utf-16-le')#decode from UTF-16-LE
'support@psiloc.com'


I have some escaped strings that need to be unescaped. I'd like to do this in Python.

For example, in python2.7 I can do this:

>>> "\123omething special".decode('string-escape')
'Something special'
>>> 

How do I do it in Python3? This doesn't work:

>>> b"\123omething special".decode('string-escape')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LookupError: unknown encoding: string-escape
>>> 

My goal is to be abel to take a string like this:

s\000u\000p\000p\000o\000r\000t\000@\000p\000s\000i\000l\000o\000c\000.\000c\000o\000m\000

And turn it into:

"support@psiloc.com"

After I do the conversion, I'll probe to see if the string I have is encoded in UTF-8 or UTF-16.

解决方案

You'll have to use unicode_escape instead:

>>> b"\\123omething special".decode('unicode_escape')

If you start with a str object instead (equivalent to the python 2.7 unicode) you'll need to encode to bytes first, then decode with unicode_escape.

If you need bytes as end result, you'll have to encode again to a suitable encoding (.encode('latin1') for example, if you need to preserve literal byte values; the first 255 unicode code points map 1-on-1).

Your example is actually UTF-16 data with escapes. Decode from unicode_escape, back to latin1 to preserve the bytes, then from utf-16-le (UTF 16 little endian without BOM):

>>> value = b's\\000u\\000p\\000p\\000o\\000r\\000t\\000@\\000p\\000s\\000i\\000l\\000o\\000c\\000.\\000c\\000o\\000m\\000'
>>> value.decode('unicode_escape').encode('latin1')  # convert to bytes
b's\x00u\x00p\x00p\x00o\x00r\x00t\x00@\x00p\x00s\x00i\x00l\x00o\x00c\x00.\x00c\x00o\x00m\x00'
>>> _.decode('utf-16-le') # decode from UTF-16-LE
'support@psiloc.com'

这篇关于在Python3中如何进行解码('string-escape')?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆