Python 和字符规范化 [英] Python and character normalization

查看：22 发布时间：2021/12/28 16:41:23 python django utf-8 diacritics transliteration

本文介绍了Python 和字符规范化的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

你好我从包含特殊字符(例如 u"ıöüç")的外国来源检索基于文本的 utf8 数据，同时我想将它们规范化为英语，例如 "ıöüç" -> <代码>"iouc" .实现这一目标的最佳方法是什么?

解决方案

<预><代码>>>>从 unidecode 导入 unidecode>>>unidecode(u'ıöüç')'iouc'

注意你是如何输入一个 unicode 字符串并输出一个字节字符串的.输出保证为 ASCII.

Hello I retrieve text based utf8 data from a foreign source which contains special chars such as u"ıöüç" while I want to normalize them to English such as "ıöüç" -> "iouc" . What would be the best way to achieve this ?

解决方案

I recommend using Unidecode module:

>>> from unidecode import unidecode
>>> unidecode(u'ıöüç')
'iouc'

Note how you feed it a unicode string and it outputs a byte string. The output is guaranteed to be ASCII.

这篇关于Python 和字符规范化的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python 和字符规范化 [英] Python and character normalization

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python 和字符规范化 [英] Python and character normalization

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭