将unicode小写大写字母转换为与ASCII等价的字母 [英] Convert unicode small capitals to their ASCII equivalents

查看:70
本文介绍了将unicode小写大写字母转换为与ASCII等价的字母的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据集

 'Fʀɪᴇɴᴅ',ᴍᴏᴍᴍᴀᴋᴇ",ʜᴏᴜʀʟʏᴛʜᴇᴄᴏᴍᴘᴜᴛᴇʀʙᴇᴇɴᴏᴜᴛᴀᴊᴏʙғᴏʀ'ᴍᴏɴᴛʜs',ʙᴜᴛ'ʟᴀsᴛ',ᴍᴏɴᴛʜʜᴇʀᴄʜᴇᴄᴋ'ᴊᴜsᴛ','ᴡᴏʀᴋɪɴɢ',ғᴇᴡ'ʜᴏᴜʀs',sᴏᴜʀᴄᴇ", 

然后我想使用Python脚本转换为ASCII格式例如:

 Fʀɪᴇɴᴅ-朋友ᴍᴏᴍ-妈妈 

我尝试对解码进行编码,但这不起作用我也尝试过此解决方案.但这不能解决我的问题.

解决方案

Python没有提供直接转换小写字母与ASCII等价字符.但是,可以使用 str.translate 来做到这一点./p>

要使用 str.translate ,我们需要创建一个小写字符的序数到ASCII字符的映射.

要获取序数值,我们可以构造每个字符的名称,然后从 ord 它.请注意,不存在小写字母"X",在3.7之前的Python版本中,不存在小写字母"Q".

 >>>从字符串导入ascii_uppercase>>>将unicodedata导入为ud>>>#过滤掉不支持的字符>>>#Python<3.7>>>字母=(如果x不位于('Q','X')中,则x表示ascii_uppercase中的x)>>>#Python> = 3.7>>>字母=(如果x!='X',则x为ascii_uppercase中的x)>>>映射= {ord(ud.lookup('拉丁字母小写'+ x)):x表示字母x 

一旦有了映射,我们就可以使用

I have the following dataset

'Fʀɪᴇɴᴅ',
 'ᴍᴏᴍ',
 'ᴍᴀᴋᴇs',
 'ʜᴏᴜʀʟʏ',
 'ᴛʜᴇ',
 'ᴄᴏᴍᴘᴜᴛᴇʀ',
 'ʙᴇᴇɴ',
 'ᴏᴜᴛ',
 'ᴀ',
 'ᴊᴏʙ',
 'ғᴏʀ',
 'ᴍᴏɴᴛʜs',
 'ʙᴜᴛ',
 'ʟᴀsᴛ',
 'ᴍᴏɴᴛʜ',
 'ʜᴇʀ',
 'ᴄʜᴇᴄᴋ',
 'ᴊᴜsᴛ',
 'ᴡᴏʀᴋɪɴɢ',
 'ғᴇᴡ',
 'ʜᴏᴜʀs',
 'sᴏᴜʀᴄᴇ',

I want then into ASCII format using Python script for example:

Fʀɪᴇɴᴅ - FRIEND
ᴍᴏᴍ - MOM

I have tried encoding decoding but that doesn't work i also have tried this solution. but that doesn't solve my problem.

解决方案

Python doesn't provide a way to directly convert small caps characters to their ASCII equivalents. However it's possible to do this using str.translate.

To use str.translate we need to create a mapping of small caps characters' ordinal values to ASCII characters.

To get the ordinal values, we can construct the name of each character, then get the character from the unicodedata database and call ord on it. Note that there is no small caps 'X' character, and in Python versions before 3.7 small caps 'Q' is not present.

>>> from string import ascii_uppercase
>>> import unicodedata as ud

>>> # Filter out unsupported characters
>>> # Python < 3.7
>>> letters = (x for x in ascii_uppercase if x not in ('Q', 'X'))
>>> # Python >= 3.7
>>> letters = (x for x in ascii_uppercase if x != 'X') 

>>> mapping = {ord(ud.lookup('LATIN LETTER SMALL CAPITAL ' + x)): x for x in letters}

Once we have the mapping we can use it to make a translation table for str.translate, using str.maketrans, then perform the conversions.

>>> # Make as translation table
>>> tt = str.maketrans(mapping)
>>> # Use the table to "translate" strings to their ASCII equivalent.
>>> s = 'ᴍᴏɴᴛʜ'
>>> s.translate(tt)
'MONTH'

这篇关于将unicode小写大写字母转换为与ASCII等价的字母的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆