如何使用 Python 将文件格式从 Unicode 转换为 ASCII? [英] How do I convert a file's format from Unicode to ASCII using Python?
问题描述
我使用第 3 方工具输出 Unicode 格式的文件.但是,我更喜欢它是 ASCII.该工具没有更改文件格式的设置.
使用 Python 转换整个文件格式的最佳方法是什么?
您只需使用 unicode
函数就可以很容易地转换文件,但是如果没有直接ASCII 等价物.
本博客推荐unicodedata
模块,这个模块好像负责粗略转换字符而不直接对应的 ASCII 值,例如
通常转换为
Klft skrms infr p fdral lectoral groe
这是非常错误的.但是,使用 unicodedata
模块,结果可以更接近原文:
I use a 3rd party tool that outputs a file in Unicode format. However, I prefer it to be in ASCII. The tool does not have settings to change the file format.
What is the best way to convert the entire file format using Python?
You can convert the file easily enough just using the unicode
function, but you'll run into problems with Unicode characters without a straight ASCII equivalent.
This blog recommends the unicodedata
module, which seems to take care of roughly converting characters without direct corresponding ASCII values, e.g.
>>> title = u"Klüft skräms inför på fédéral électoral große"
is typically converted to
Klft skrms infr p fdral lectoral groe
which is pretty wrong. However, using the unicodedata
module, the result can be much closer to the original text:
>>> import unicodedata
>>> unicodedata.normalize('NFKD', title).encode('ascii','ignore')
'Kluft skrams infor pa federal electoral groe'
这篇关于如何使用 Python 将文件格式从 Unicode 转换为 ASCII?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!