将全角Unicode字符转换为ASCII字符 [英] Convert full-width Unicode characters into ASCII characters
本文介绍了将全角Unicode字符转换为ASCII字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在Unicode中有一些字符串文本,其中包含一些数字,如下所示:
I have some string text in unicode, containing some numbers as below:
txt = '36fsdfdsf14'
但是, int(txt [:2])
不能将字符识别为数字.如何更改字符以使其识别为数字?
However, int(txt[:2])
does not recognize the characters as number. How to change the characters to have them recognized as number?
推荐答案
如果您确实拥有Unicode(或将字节字符串解码为Unicode),则可以使用规范的替换规范化数据:
If you actually have Unicode (or decode your byte string to Unicode) then you can normalize the data with a canonical replacement:
>>> s = u'36fsdfdsf14'
>>> s
u'\uff13\uff16fsdfdsf\uff11\uff14'
>>> import unicodedata as ud
>>> ud.normalize('NFKC',s)
u'36fsdfdsf14'
如果规范化规范对您来说变化太大,则可以制作仅包含所需替换项的转换表:
If canonical normalization changes too much for you, you can make a translation table of just the replacements you want:
#coding:utf8
repl = u'0123456789'
# Fullwidth digits are U+FF10 to U+FF19.
# This makes a lookup table from Unicode ordinal to the ASCII character equivalent.
xlat = dict(zip(range(0xff10,0xff1a),repl))
s = u'36fsdfdsf14'
print(s.translate(xlat))
输出:
36fsdfdsf14
这篇关于将全角Unicode字符转换为ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文