如何结合'utf-8'和'unicode_escape'来正确解码b'\xc3\xa4\\n-\\t-\\“ foo\\”? [英] how combine 'utf-8' and 'unicode_escape' to correctly decode b'\xc3\xa4\\n-\\t-\\"foo\\"'?

查看:345
本文介绍了如何结合'utf-8'和'unicode_escape'来正确解码b'\xc3\xa4\\n-\\t-\\“ foo\\”?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个库,可以给我这样编码和转义的字节序列:

I have a library that gives me encoded and escaped byte sequences like this one:

a=b'\xc3\xa4\\n-\\t-\\"foo\\"'

我想翻译回以下内容:

ä
-   -"foo"

我试图只是 .decode a >根据需要解码该序列:

I tried to just .decode a which decodes the sequence as wanted:

>>> a.decode()
'ä\\n-\\t-\\"foo\\"'

但是它不会逃脱。然后我发现'unicode_escape',然后得到

But it does not un-escape. Then I found 'unicode_escape' and I got

>>> print(a.decode('unicode_escape'))
ä
-   -"foo"

是否有一种使用内置方法解码和解散给定序列的方法(即无需 .replace('\\n','\n')。replace (...))?

Is there a way to decode and unescape the given sequence with a builtin method (i.e. without having to .replace('\\n', '\n').replace(...))?

知道如何恢复该操作(也就是从中获取相同的字节序列)也很有趣

It would be also interesting to know how I can revert this operation (i.e. getting the same byte sequence from the translated result).

推荐答案

有一种方法可以以某种方式做我想做的事,我可以几乎也走了另一条路,但在我看来这是丑陋且不完整的,所以我希望它不是我的最佳选择:

There is a way to somehow do what I want and I can almost go the other way, too but in my eyes it's ugly and incomplete, so I hope it's not the best option I have:

>>> import codecs
>>> decoded = codecs.escape_decode(a)[0].decode()
>>> print(decoded)
ä
-   -"foo"
>>> reencoded = codecs.escape_encode(decoded.encode())
>>> print(reencoded)
(b'\\xc3\\xa4\\n-\\t-"foo"', 11)      <--- qotes are note escaped

这篇关于如何结合'utf-8'和'unicode_escape'来正确解码b'\xc3\xa4\\n-\\t-\\“ foo\\”?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆