如何让 str.translate 使用 Unicode 字符串? [英] How do I get str.translate to work with Unicode strings?

查看:42
本文介绍了如何让 str.translate 使用 Unicode 字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码:

导入字符串def translate_non_alphanumerics(to_translate, translate_to='_'):not_letters_or_digits = u'!"#%\'()*+,-./:;<=>?@[\]^_`{|}~'translate_table = string.maketrans(not_letters_or_digits,翻译成*len(not_letters_or_digits))返回 to_translate.translate(translate_table)

这对非 unicode 字符串很有用:

<预><代码>>>>translate_non_alphanumerics('!')'_foo__'

但对于 unicode 字符串失败:

<预><代码>>>>translate_non_alphanumerics(u'<foo>!')回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中文件<stdin>",第 5 行,在 translate_non_alphanumerics 中类型错误:字符映射必须返回整数、无或 unicode

我无法理解 Python 2.6.2 文档,用于 str.translate() 方法.

我如何使它适用于 Unicode 字符串?

解决方案

Unicode 版本的 translate 需要来自 Unicode 序数的映射(您可以使用 ord) 到 Unicode 序数.如果要删除字符,则映射到 None.

我改变了你的函数来构建一个字典,将每个字符的序数映射到你想要翻译的序数:

def translate_non_alphanumerics(to_translate, translate_to=u'_'):not_letters_or_digits = u'!"#%\'()*+,-./:;<=>?@[\]^_`{|}~'translate_table = dict((ord(char), translate_to) for char in not_letters_or_digits)返回 to_translate.translate(translate_table)>>>translate_non_alphanumerics(u'<foo>!')你'_foo__'

事实证明,翻译映射必须从 Unicode 序数(通过 ord)映射到另一个 Unicode 序数、Unicode 字符串或 None(到删除).因此,我将 translate_to 的默认值更改为 Unicode 文字.例如:

<预><代码>>>>translate_non_alphanumerics(u'<foo>!', u'bad')你'badfoobadbad'

I have the following code:

import string
def translate_non_alphanumerics(to_translate, translate_to='_'):
    not_letters_or_digits = u'!"#%\'()*+,-./:;<=>?@[\]^_`{|}~'
    translate_table = string.maketrans(not_letters_or_digits,
                                       translate_to
                                         *len(not_letters_or_digits))
    return to_translate.translate(translate_table)

Which works great for non-unicode strings:

>>> translate_non_alphanumerics('<foo>!')
'_foo__'

But fails for unicode strings:

>>> translate_non_alphanumerics(u'<foo>!')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in translate_non_alphanumerics
TypeError: character mapping must return integer, None or unicode

I can't make any sense of the paragraph on "Unicode objects" in the Python 2.6.2 docs for the str.translate() method.

How do I make this work for Unicode strings?

解决方案

The Unicode version of translate requires a mapping from Unicode ordinals (which you can retrieve for a single character with ord) to Unicode ordinals. If you want to delete characters, you map to None.

I changed your function to build a dict mapping the ordinal of every character to the ordinal of what you want to translate to:

def translate_non_alphanumerics(to_translate, translate_to=u'_'):
    not_letters_or_digits = u'!"#%\'()*+,-./:;<=>?@[\]^_`{|}~'
    translate_table = dict((ord(char), translate_to) for char in not_letters_or_digits)
    return to_translate.translate(translate_table)

>>> translate_non_alphanumerics(u'<foo>!')
u'_foo__'

edit: It turns out that the translation mapping must map from the Unicode ordinal (via ord) to either another Unicode ordinal, a Unicode string, or None (to delete). I have thus changed the default value for translate_to to be a Unicode literal. For example:

>>> translate_non_alphanumerics(u'<foo>!', u'bad')
u'badfoobadbad'

这篇关于如何让 str.translate 使用 Unicode 字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆