在Python3中如何进行解码('string-escape')? [英] how do I .decode('string-escape') in Python3?
问题描述
例如,在python2.7中,我可以这样做:
>>> \123特殊的.decode('string-escape')
'特殊的东西'
>>>
如何在Python3中执行?这不行:
>>> b\123特殊的.decode('string-escape')
追溯(最近的最后一次调用):
文件< stdin>,第1行,< module>
LookupError:未知编码:string-escape
>>>
我的目标是要采取一个如下这样的字符串:
$ $ $ $ $ $ 000l\000o\000c\000.\000c\000o\000m\000
并将其变成:
support@psiloc.com
/ pre>
在进行转换后,我会探测一下我所使用的字符串是否以UTF-8或UTF-16编码。
解决方案您必须使用
unicode_escape
:>>> b\\123特殊的.decode('unicode_escape')
如果你开始使用
str
对象(相当于python 2.7 unicode),您需要先编码为字节,然后使用unicode_escape
。
如果您需要字节作为最终结果,则必须重新编码为合适的编码(
.encode('latin1')
例如,如果需要保留文字字节值,则前255个unicode代码点映射1对1)
您的示例实际上是具有转义的UTF-16数据。从
unicode_escape
中解码,返回到latin1
以保留字节,然后从utf-16- le
(UTF 16 little endian without BOM):>>> value = b's\\000u\\000p\\\\000p\\000o\\000r\\000t\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ $ \\ $ >>> value.decode('unicode_escape')。encode('latin1')#convert to bytes
b's\x00u\x00p\x00p\x00o\x00r\x00t\x00 @ \x00p\\ \\ x00s\x00i\x00l\x00o\x00c\x00.\x00c\x00o\x00m\x00'
>>>> _.decode('utf-16-le')#decode from UTF-16-LE
'support@psiloc.com'
I have some escaped strings that need to be unescaped. I'd like to do this in Python.
For example, in python2.7 I can do this:
>>> "\123omething special".decode('string-escape') 'Something special' >>>
How do I do it in Python3? This doesn't work:
>>> b"\123omething special".decode('string-escape') Traceback (most recent call last): File "<stdin>", line 1, in <module> LookupError: unknown encoding: string-escape >>>
My goal is to be abel to take a string like this:
s\000u\000p\000p\000o\000r\000t\000@\000p\000s\000i\000l\000o\000c\000.\000c\000o\000m\000
And turn it into:
"support@psiloc.com"
After I do the conversion, I'll probe to see if the string I have is encoded in UTF-8 or UTF-16.
解决方案You'll have to use
unicode_escape
instead:>>> b"\\123omething special".decode('unicode_escape')
If you start with a
str
object instead (equivalent to the python 2.7 unicode) you'll need to encode to bytes first, then decode withunicode_escape
.If you need bytes as end result, you'll have to encode again to a suitable encoding (
.encode('latin1')
for example, if you need to preserve literal byte values; the first 255 unicode code points map 1-on-1).Your example is actually UTF-16 data with escapes. Decode from
unicode_escape
, back tolatin1
to preserve the bytes, then fromutf-16-le
(UTF 16 little endian without BOM):>>> value = b's\\000u\\000p\\000p\\000o\\000r\\000t\\000@\\000p\\000s\\000i\\000l\\000o\\000c\\000.\\000c\\000o\\000m\\000' >>> value.decode('unicode_escape').encode('latin1') # convert to bytes b's\x00u\x00p\x00p\x00o\x00r\x00t\x00@\x00p\x00s\x00i\x00l\x00o\x00c\x00.\x00c\x00o\x00m\x00' >>> _.decode('utf-16-le') # decode from UTF-16-LE 'support@psiloc.com'
这篇关于在Python3中如何进行解码('string-escape')?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!