全面的字符替换模块在python非unicode和非ascii为HTML [英] Comprehensive character replacement module in python for non-unicode and non-ascii for HTML

查看:422
本文介绍了全面的字符替换模块在python非unicode和非ascii为HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一个全面的字符替换模块python找到字符串中的所有非ascii或非unicode字符,并用ascii或unicode equivilents替换它们?在编码或解码期间,这种忽略参数的舒适度是疯狂的,但是同样地,在非翻译字符的每个地方也是如此。

Is there a comprehensive character replacement module for python that finds all non-ascii or non-unicode characters in a string and replaces them with ascii or unicode equivilents? This comfort with the "ignore" argument during encoding or decoding is insane, but likewise so is a '?' in every place that a non translated character was.

寻找一个模块,找到令人讨厌的字符,并使其符合任何标准要求。
我意识到,现存的字母和编码的数量使这有点不可能,但肯定有人已经刺了它吗?

I'm looking for one module that finds irksome characters and conforms them to whatever standard is requested. I realize that the amount of extant alphabets and encodings makes this somewhat impossible, but surely someone has taken a stab at it? Even a rudimentary solution would be better than the status quo.

这就意味着数据传输的简化是巨大的。

The simplification for data transfer that this would mean is enormous.

推荐答案

我不认为你想要的是真的可能 - 但我认为有一个体面的选择。

i don't think what you want is really possible - but i think there is a decent option.

unicodedata有一个'normalize'方法,可以为你优雅地降低文本...

unicodedata has a 'normalize' method that can gracefully degrade text for you...

import unicodedata
def gracefully_degrade_to_ascii( text ):
    return unicodedata.normalize('NFKD',text).encode('ascii','ignore')

假设你使用的字符集已经映射到unicode - 或者至少可以映射到unicode - 你应该能够将该文本的unicode版本降级为ascii或utf-8这个模块(也是标准库的一部分)

assuming the charset you're using is already mapped into unicode - or at least can be mapped into unicode - you should be able to degrade the unicode version of that text down to ascii or utf-8 with this module ( it's part of the standard library too )

完整文档 - http://docs.python.org/library/unicodedata.html

这篇关于全面的字符替换模块在python非unicode和非ascii为HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆