如何在Python中替换字符串中的无效unicode字符? [英] How to replace invalid unicode characters in a string in Python?

查看:164
本文介绍了如何在Python中替换字符串中的无效unicode字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我所知,python的概念是在字符串中仅包含有效字符,但就我而言,操作系统将在我必须处理的路径名中提供带有无效编码的字符串。因此,我最终得到的字符串包含非Unicode字符。

As far as I know it is the concept of python to have only valid characters in a string, but in my case the OS will deliver strings with invalid encodings in path names I have to deal with. So I end up with strings that contain characters that are non-unicode.

为了纠正这些问题,我需要以某种方式显示这些字符串。不幸的是,我无法打印它们,因为它们包含非Unicode字符。是否有一种优雅的方法可以某种方式替换这些字符,至少可以对字符串的内容有所了解?

In order to correct these problems I need to display these strings somehow. Unfortunately I can not print them because they contain non-unicode characters. Is there an elegant way to replace these characters somehow to at least get some idea of the content of the string?

我的想法是逐字符处理这些字符串,然后检查存储的字符是否实际上是有效的unicode。如果字符无效,我想使用某个unicode符号。但是我该怎么办呢?使用 codecs 似乎不适合该目的:我已经有一个字符串,由操作系统返回,而不是字节数组。将字符串转换为字节数组似乎涉及解码,这在我的情况下当然会失败。

My idea would be to process these strings character by character and check if the character stored is actually valid unicode. In case of an invalid character I would like to use a certain unicode symbol. But how can I do this? Using codecs seems not to be suitable for that purpose: I already have a string, returned by the operating system, and not a byte array. Converting a string to byte array seems to involve decoding which will fail in my case of course. So it seems that I'm stuck.

对我来说,如何创建这样的替换字符串有技巧吗?

Do you have an tips for me how to be able to create such a replacement string?

推荐答案

感谢您的评论。这样,我就能实施更好的解决方案:

Thanks to you for your comments. This way I was able to implement a better solution:

    try:
        s2 = codecs.encode(s, "utf-8")
        return (True, s, None)
    except Exception as e:
        ret = codecs.decode(codecs.encode(s, "utf-8", "replace"), "utf-8")
        return (False, ret, e)

请分享对该解决方案的任何改进。谢谢!

Please share any improvements on that solution. Thank you!

这篇关于如何在Python中替换字符串中的无效unicode字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆