Python - Unicode 到 ASCII 的转换 [英] Python - Unicode to ASCII conversion

查看:19
本文介绍了Python - Unicode 到 ASCII 的转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法在不丢失数据的情况下将以下 Unicode 转换为 ASCII:

u'ABRAxc3O JOSxc9'

我尝试了 encodedecode,但他们不会这样做.

有人有什么建议吗?

解决方案

Unicode 字符 u'xce0'u'xc9' 没有任何对应ASCII 值.因此,如果您不想丢失数据,则必须以某种有效的 ASCII 方式对该数据进行编码.选项包括:

<预><代码>>>>打印 s.encode('ascii', errors='backslashreplace')ABRAxc3O JOSxc9>>>打印 s.encode('ascii', errors='xmlcharrefreplace')ABRA&#195;O JOS&#201;>>>打印 s.encode('unicode-escape')ABRAxc3O JOSxc9>>>打印 s.encode('punycode')ABRAO JOS-jta5e

所有这些都是 ASCII 字符串,并包含来自原始 Unicode 字符串的所有信息(因此它们都可以在不丢失数据的情况下反转),但对于最终用户来说,它们都不是那么漂亮(并且没有可以通过 decode('ascii') 反转它们.

参见str.encodePython 特定编码Unicode HOWTO 了解更多信息.

<小时>

作为旁注,当有些人说ASCII"时,他们的意思并不是ASCII",而是作为 ASCII 超集的任何 8 位字符集"或某些特定的 8 位字符集我心里有数".如果这就是您的意思,那么解决方案是编码为正确的 8 位字符集:

<预><代码>>>>s.encode('utf-8')'ABRAxc3x83O JOSxc3x89'>>>s.encode('cp1252')'ABRAxc3O JOSxc9'>>>s.encode('iso-8859-15')'ABRAxc3O JOSxc9'

困难的部分是知道您指的是哪个字符集.如果您同时编写产生 8 位字符串的代码和使用它的代码,而且您对此一无所知,那么您的意思是 UTF-8.如果使用 8 位字符串的代码是 open 函数或您正在向其提供页面的 Web 浏览器或其他东西,则事情会更加复杂,并且没有简单的答案没有更多信息.

I am unable to convert the following Unicode to ASCII without losing data:

u'ABRAxc3O JOSxc9'

I tried encode and decode and they won’t do it.

Does anyone have a suggestion?

解决方案

The Unicode characters u'xce0' and u'xc9' do not have any corresponding ASCII values. So, if you don't want to lose data, you have to encode that data in some way that's valid as ASCII. Options include:

>>> print s.encode('ascii', errors='backslashreplace')
ABRAxc3O JOSxc9
>>> print s.encode('ascii', errors='xmlcharrefreplace')
ABRA&#195;O JOS&#201;
>>> print s.encode('unicode-escape')
ABRAxc3O JOSxc9
>>> print s.encode('punycode')
ABRAO JOS-jta5e

All of these are ASCII strings, and contain all of the information from your original Unicode string (so they can all be reversed without loss of data), but none of them are all that pretty for an end-user (and none of them can be reversed just by decode('ascii')).

See str.encode, Python Specific Encodings, and Unicode HOWTO for more info.


As a side note, when some people say "ASCII", they really don't mean "ASCII" but rather "any 8-bit character set that's a superset of ASCII" or "some particular 8-bit character set that I have in mind". If that's what you meant, the solution is to encode to the right 8-bit character set:

>>> s.encode('utf-8')
'ABRAxc3x83O JOSxc3x89'
>>> s.encode('cp1252')
'ABRAxc3O JOSxc9'
>>> s.encode('iso-8859-15')
'ABRAxc3O JOSxc9'

The hard part is knowing which character set you meant. If you're writing both the code that produces the 8-bit strings and the code that consumes it, and you don't know any better, you meant UTF-8. If the code that consumes the 8-bit strings is, say, the open function or a web browser that you're serving a page to or something else, things are more complicated, and there's no easy answer without a lot more information.

这篇关于Python - Unicode 到 ASCII 的转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆