将全角Unicode字符转换为ASCII字符 [英] Convert full-width Unicode characters into ASCII characters

查看:73
本文介绍了将全角Unicode字符转换为ASCII字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Unicode中有一些字符串文本,其中包含一些数字,如下所示:

I have some string text in unicode, containing some numbers as below:

txt = '36fsdfdsf14'

但是, int(txt [:2])不能将字符识别为数字.如何更改字符以使其识别为数字?

However, int(txt[:2]) does not recognize the characters as number. How to change the characters to have them recognized as number?

推荐答案

如果您确实拥有Unicode(或将字节字符串解码为Unicode),则可以使用规范的替换规范化数据:

If you actually have Unicode (or decode your byte string to Unicode) then you can normalize the data with a canonical replacement:

>>> s = u'36fsdfdsf14'
>>> s
u'\uff13\uff16fsdfdsf\uff11\uff14'
>>> import unicodedata as ud
>>> ud.normalize('NFKC',s)
u'36fsdfdsf14'

如果规范化规范对您来说变化太大,则可以制作仅包含所需替换项的转换表:

If canonical normalization changes too much for you, you can make a translation table of just the replacements you want:

#coding:utf8

repl = u'0123456789'

# Fullwidth digits are U+FF10 to U+FF19.
# This makes a lookup table from Unicode ordinal to the ASCII character equivalent.
xlat = dict(zip(range(0xff10,0xff1a),repl))

s = u'36fsdfdsf14'

print(s.translate(xlat))

输出:

36fsdfdsf14

这篇关于将全角Unicode字符转换为ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆