将unicode小写大写字母转换为与ASCII等价的字母 [英] Convert unicode small capitals to their ASCII equivalents
问题描述
我有以下数据集
'Fʀɪᴇɴᴅ',ᴍᴏᴍᴍᴀᴋᴇ",ʜᴏᴜʀʟʏᴛʜᴇᴄᴏᴍᴘᴜᴛᴇʀʙᴇᴇɴᴏᴜᴛᴀᴊᴏʙғᴏʀ'ᴍᴏɴᴛʜs',ʙᴜᴛ'ʟᴀsᴛ',ᴍᴏɴᴛʜʜᴇʀᴄʜᴇᴄᴋ'ᴊᴜsᴛ','ᴡᴏʀᴋɪɴɢ',ғᴇᴡ'ʜᴏᴜʀs',sᴏᴜʀᴄᴇ",
然后我想使用Python脚本转换为ASCII格式例如:
Fʀɪᴇɴᴅ-朋友ᴍᴏᴍ-妈妈
我尝试对解码进行编码,但这不起作用我也尝试过此解决方案.但这不能解决我的问题.
Python没有提供直接转换小写字母与ASCII等价字符.但是,可以使用 str.translate 来做到这一点./p>
要使用 str.translate
,我们需要创建一个小写字符的序数到ASCII字符的映射.
要获取序数值,我们可以构造每个字符的名称,然后从 ord 它.请注意,不存在小写字母"X",在3.7之前的Python版本中,不存在小写字母"Q".
>>>从字符串导入ascii_uppercase>>>将unicodedata导入为ud>>>#过滤掉不支持的字符>>>#Python<3.7>>>字母=(如果x不位于('Q','X')中,则x表示ascii_uppercase中的x)>>>#Python> = 3.7>>>字母=(如果x!='X',则x为ascii_uppercase中的x)>>>映射= {ord(ud.lookup('拉丁字母小写'+ x)):x表示字母x
一旦有了映射,我们就可以使用 I have the following dataset I want then into ASCII format using Python script
for example: I have tried encoding decoding but that doesn't work
i also have tried this solution. but that doesn't solve my problem. Python doesn't provide a way to directly convert small caps characters to their ASCII equivalents. However it's possible to do this using str.translate. To use To get the ordinal values, we can construct the name of each character, then get the character from the unicodedata database and call ord on it. Note that there is no small caps 'X' character, and in Python versions before 3.7 small caps 'Q' is not present. Once we have the mapping we can use it to make a translation table for
这篇关于将unicode小写大写字母转换为与ASCII等价的字母的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!'Fʀɪᴇɴᴅ',
'ᴍᴏᴍ',
'ᴍᴀᴋᴇs',
'ʜᴏᴜʀʟʏ',
'ᴛʜᴇ',
'ᴄᴏᴍᴘᴜᴛᴇʀ',
'ʙᴇᴇɴ',
'ᴏᴜᴛ',
'ᴀ',
'ᴊᴏʙ',
'ғᴏʀ',
'ᴍᴏɴᴛʜs',
'ʙᴜᴛ',
'ʟᴀsᴛ',
'ᴍᴏɴᴛʜ',
'ʜᴇʀ',
'ᴄʜᴇᴄᴋ',
'ᴊᴜsᴛ',
'ᴡᴏʀᴋɪɴɢ',
'ғᴇᴡ',
'ʜᴏᴜʀs',
'sᴏᴜʀᴄᴇ',
Fʀɪᴇɴᴅ - FRIEND
ᴍᴏᴍ - MOM
str.translate
we need to create a mapping of small caps characters' ordinal values to ASCII characters.>>> from string import ascii_uppercase
>>> import unicodedata as ud
>>> # Filter out unsupported characters
>>> # Python < 3.7
>>> letters = (x for x in ascii_uppercase if x not in ('Q', 'X'))
>>> # Python >= 3.7
>>> letters = (x for x in ascii_uppercase if x != 'X')
>>> mapping = {ord(ud.lookup('LATIN LETTER SMALL CAPITAL ' + x)): x for x in letters}
str.translate
, using str.maketrans, then perform the conversions.>>> # Make as translation table
>>> tt = str.maketrans(mapping)
>>> # Use the table to "translate" strings to their ASCII equivalent.
>>> s = 'ᴍᴏɴᴛʜ'
>>> s.translate(tt)
'MONTH'