如何在python中对iso-8859-15转换的unicode编码进行归一化？ [英] How to normalize unicode encoding for iso-8859-15 conversion in python?

查看：224 发布时间：2017/8/16 23:56:02 python unicode encoding utf-8 iso-8859-15

本文介绍了如何在python中对iso-8859-15转换的unicode编码进行归一化？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想将unicode字符串转换为iso-8859-15。这些字符串包括 u\\\’（RIGHT SINGLE QUOTATION MARK参见 http://www.fileformat.info/info/unicode/char/2019/index.htm ）不属于iso-8859-15的字符字符集。

在Python中，如何标准化unicode字符以匹配iso-8859-15编码？

我已经看到unicodedata模块没有成功。我设法使用

  s.replace（u\\\’，'）.encode（' iso-8859-15'）

但我想找到一个更一般和更干净的方式。 / p>

感谢您的帮助

解决方案

使用unicode版本的< a href =http://docs.python.org/library/stdtypes.html#str.translate =nofollow> translate 功能，假设 s 是一个unicode字符串：

  s.translate（{ord （u\\\’）：ord（u'）}）

unicode版本的 translate 是一个dict映射unicode序号到unicode ordinals。添加到此dict其他字符，您不能编码在您的目标编码。

您可以以更易读的形式构建映射表，并从中创建映射表，实例：

  char_mappings = [（u\\\’，u'），
（u '，u'）] 
 translate_mapping = {ord（k）：ord（v）for k，v in char_mappings}

从翻译文档：

对于Unicode对象，translate（）方法不接受
可选的deletechars参数。相反，它返回一个s
的副本，其中所有字符都通过给定的翻译
表映射，该表必须是Unicode序数到Unicode序号的映射，
Unicode字符串或无。未映射的字符保持不变。
映射到无的字符将被删除。注意，一个更灵活的方法
是使用编解码器模块
创建一个自定义字符映射编解码器（参见encodings.cp1251作为示例）。

I want to convert unicode string into iso-8859-15. These strings include the u"\u2019" (RIGHT SINGLE QUOTATION MARK see http://www.fileformat.info/info/unicode/char/2019/index.htm) character which is not part of the iso-8859-15 characters set.

In Python, how to normalize the unicode characters in order to match the iso-8859-15 encoding?

I have looked at the unicodedata module without success. I manage to do the job with

s.replace(u"\u2019", "'").encode('iso-8859-15')

but I would like to find a more general and cleaner way.

Thanks for your help

解决方案

Use the unicode version of the translate function, assuming s is a unicode string:

s.translate({ord(u"\u2019"):ord(u"'")})

The argument of the unicode version of translate is a dict mapping unicode ordinals to unicode ordinals. Add to this dict other characters you cannot encode in your target encoding.

You can build your mapping table in a little more readable form and create your mapping dict from it, for instance:

char_mappings = [(u"\u2019", u"'"),
                 (u"`", u"'")]
translate_mapping = {ord(k):ord(v) for k,v in char_mappings}

From translate documentation:

For Unicode objects, the translate() method does not accept the optional deletechars argument. Instead, it returns a copy of the s where all characters have been mapped through the given translation table which must be a mapping of Unicode ordinals to Unicode ordinals, Unicode strings or None. Unmapped characters are left untouched. Characters mapped to None are deleted. Note, a more flexible approach is to create a custom character mapping codec using the codecs module (see encodings.cp1251 for an example).

这篇关于如何在python中对iso-8859-15转换的unicode编码进行归一化？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在python中对iso-8859-15转换的unicode编码进行归一化？ [英] How to normalize unicode encoding for iso-8859-15 conversion in python?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在python中对iso-8859-15转换的unicode编码进行归一化？ [英] How to normalize unicode encoding for iso-8859-15 conversion in python?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭