转换双斜杠utf-8编码 [英] Converting double slash utf-8 encoding

查看：369 发布时间：2020/7/13 2:48:39 python unicode encoding utf-8

本文介绍了转换双斜杠utf-8编码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我无法使它正常工作！我有一个保存游戏文件解析器中的文本文件，其中有许多UTF-8中文名称的字节格式，如source.txt中这样:

I cannot get this to work! I have a text file from a save game file parser with a bunch of UTF-8 Chinese names in it in byte form, like this in the source.txt:

\ xe6 \ x89 \ x8e \ xe5 \ x8a \ xa0 \ xe6 \ x8b \ x89

\xe6\x89\x8e\xe5\x8a\xa0\xe6\x8b\x89

但是，无论我如何将其导入到Python(3或2)中，我最多只能得到以下字符串:

But, no matter how I import it into Python (3 or 2), I get this string, at best:

\\ xe6 \\ x89 \\ x8e \\ xe5 \\ x8a \\ xa0 \\ xe6 \\ x8b \\ x89

\\xe6\\x89\\x8e\\xe5\\x8a\\xa0\\xe6\\x8b\\x89

我尝试过，就像其他线程建议的那样，将字符串重新编码为UTF-8，然后使用Unicode转义对其进行解码，如下所示:

I have tried, like other threads have suggested, to re-encode the string as UTF-8 and then decode it with unicode escape, like so:

stringName.encode("utf-8").decode("unicode_escape")

但是随后它弄乱了原始编码，并将其作为字符串:

But then it messes up the original encoding, and gives this as the string:

'æ\ x89 \x8eå\ x8a \xa0æ\ x8b \ x89'(打印此字符串将导致:æåæ)

'æ\x89\x8eå\x8a\xa0æ\x8b\x89' (printing this string results in: æå æ )

现在，如果我手动将b +原始字符串复制并粘贴到文件名中并对其进行编码，那么我将获得正确的编码.例如:

Now, if I manually copy and paste b + the original string in the filename and encode this, I get the correct encoding. For example:

b'\xe6\x89\x8e\xe5\x8a\xa0\xe6\x8b\x89'.encode("utf-8")

结果为:扎加拉"

但是，我无法以编程方式执行此操作.我什至不能摆脱双斜线.

But, I can't do this programmatically. I can't even get rid of the double slashes.

为清楚起见，source.txt包含单个反斜杠.我尝试了多种导入方式，但这是最常见的方式:

To be clear, source.txt contains single backslashes. I have tried importing it in many ways, but this is the most common:

with open('source.txt','r',encoding='utf-8') as f_open:
    source = f_open.read()

好的，所以我单击了下面的答案(我认为)，但这是可行的:

Okay, so I clicked the answer below (I think), but here is what works:

from ast import literal_eval
decodedString = literal_eval("b'{}'".format(stringVariable)).decode('utf-8')

由于其他编码问题，我无法在整个文件上使用它，而是将每个名称提取为字符串(stringVariable)，然后执行此操作！谢谢！

I can't use it on the whole file because of other encoding issues, but extracting each name as a string (stringVariable) and then doing that works! Thank you!

更清楚地说，原始文件不仅仅是这些混乱的utf编码.它仅将它们用于某些字段.例如，这是文件的开头:

To be more clear, the original file is not just these messed up utf encodings. It only uses them for certain fields. For example, here is the beginning of the file:

{'m_cacheHandles': ['s2ma\x00\x00CN\x1f\x1b"\x8d\xdb\x1fr \\\xbf\xd4D\x05R\x87\x10\x0b\x0f9\x95\x9b\xe8\x16T\x81b\xe4\x08\x1e\xa8U\x11',
                's2ma\x00\x00CN\x1a\xd9L\x12n\xb9\x8aL\x1d\xe7\xb8\xe6\xf8\xaa\xa1S\xdb\xa5+\t\xd3\x82^\x0c\x89\xdb\xc5\x82\x8d\xb7\x0fv',
                's2ma\x00\x00CN\x92\xd8\x17D\xc1D\x1b\xf6(\xedj\xb7\xe9\xd1\x94\x85\xc8`\x91M\x8btZ\x91\xf65\x1f\xf9\xdc\xd4\xe6\xbb',
                's2ma\x00\x00CN\xa1\xe9\xab\xcd?\xd2PS\xc9\x03\xab\x13R\xa6\x85u7(K2\x9d\x08\xb8k+\xe2\xdeI\xc3\xab\x7fC',
                's2ma\x00\x00CNN\xa5\xe7\xaf\xa0\x84\xe5\xbc\xe9HX\xb93S*sj\xe3\xf8\xe7\x84`\xf1Ye\x15~\xb93\x1f\xc90',
                's2ma\x00\x00CN8\xc6\x13F\x19\x1f\x97AH\xfa\x81m\xac\xc9\xa6\xa8\x90s\xfdd\x06\rL]z\xbb\x15\xdcI\x93\xd3V'],
'm_campaignIndex': 0,
'm_defaultDifficulty': 7,
'm_description': '',
'm_difficulty': '',
'm_gameSpeed': 4,
'm_imageFilePath': '',
'm_isBlizzardMap': True,
'm_mapFileName': '',
'm_miniSave': False,
'm_modPaths': None,
'm_playerList': [{'m_color': {'m_a': 255, 'm_b': 255, 'm_g': 92,   'm_r': 36},
               'm_control': 2,
               'm_handicap': 0,
               'm_hero': '\xe6\x89\x8e\xe5\x8a\xa0\xe6\x8b\x89',

'm_hero':字段之前的所有信息都不是utf-8.因此，如果文件仅由这些伪造的utf编码组成，则可以使用ShadowRanger的解决方案，但是当我已经将m_hero解析为字符串并尝试将其转换时，该方法将不起作用. Karin的解决方案确实可以做到这一点.

All of the information before the 'm_hero': field is not utf-8. So using ShadowRanger's solution works if the file is only made up of these fake utf-encodings, but it doesn't work when I have already parsed m_hero as a string and try to convert that. Karin's solution does work for that.

转换双斜杠utf-8编码 [英] Converting double slash utf-8 encoding

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

转换双斜杠utf-8编码 [英] Converting double slash utf-8 encoding

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭