将重音字符替换为不重音字符 [英] Replace accented chars with unaccented ones
问题描述
嗨
我想用非
accetued ones(?? - >" e","?¨" - >" e","?*" - >" a" )。
我尝试过string.replace方法,但似乎不喜欢非ascii字符...
你能帮帮我吗?
谢谢。
谢谢你们的回答。他们的工作非常好。
首先,我相信我不工作,因为我所犯的错误是
忘记了" U" for string:u"é"。因为我的文件已经是utf-8
编码(# - * - 编码:UTF-8 - * - ),我认为u没有必要......
i错了。
再见。
你有两个选择。首先,将字符串转换为Unicode并使用代码
,如下所示:
替换= [(u''\xe9'',''e ''),...]
def remove_accents(u):
代替a,b代替:
u = u.replace( a,b)
返回u
remove_accents(u''\xe9 '')
u'''
其次,如果您使用的是单字节编码(iso8859-1,用于
实例),然后使用字节字符串:
replacement_map = string.maketrans(''\ xe9 ...'',''e ...'')
def remove_accents(s):
返回s.translate(replacement_map)
remove_accents(''\ xe9'')
''e''
如果你想在程序中使用u''é''这样的字符串,你必须
在t的顶部包含一行他的源文件告诉Python
编码,如下所示:
# - * - 编码:utf-8 - * -
(除非您必须命名编辑器使用的编码,如果它不是
utf-8)请参阅 http://python.org/peps/pep-0263.html
一旦你有了完成后,你可以写
替换= [(u''é'',''e''),...]
而不是使用\ xXX为它逃脱。
Jeff
Jeff Epler写道:
你有两个选择。首先,将字符串转换为Unicode并使用如下代码:
替换= [(u''\xe9'',''e''),...]
def remove_accents(u):
for a,b in replacements:
u = u.replace(a,b)
return u
< blockquote class =post_quotes>remove_accents(u''\xe9'')
u''''
其次,如果你正在使用单字节编码(iso8859-1,用于
实例),然后使用字节字符串:
replacement_map = string.maketrans(''\ xe9 ...'',''e .. 。'')
def remove_accents(s):
返回s.translate(replacement_map)
remove_accents(''\ xe9'')
''''
如果你想在节目中加入像你这样的字符串,你必须在在源文件的顶部告诉Python
编码,喜欢以下几行:
# - * - 编码:utf-8 - * -
(除非你必须命名你的编辑器使用的编码,如果它不是
utf- 8)参见 http://python.org/peps/pep-0263.html
一旦你完成了,你可以写
替换= [(u''é'',''e''),...] <而不是使用\ xXX转义。
将替换对转换为字典会导致
显着大量替换的加速。
mapping = dict(replacement_pairs)
def multi_replace(inp,mapping = mapping):
返回u''''。join([mapping.get(i,i)for in in inp])
一次通过文件给出一个O( len(inp))算法,比运行在
O(len(inp)* len(replacement_)中的string.replace方法好得多b $ b(运行时间明智)对))给出的时间。
- Josiah
Hi
I would like to replace accentuel chars (like "??", "?¨" or "?*") with non
accetued ones ("??" -> "e", "?¨" -> "e", "?*" -> "a").
I have tried string.replace method, but it seems dislike non ascii chars...
Can you help me please ?
Thanks.
Thank you both for your answer. They works well both very good.
First, i believe i doesn''t work, because the error i''ve made is to
forgot the "u" for string : u"é". Because my file was already utf-8
encoded (# -*- coding: UTF-8 -*-), i thinks the "u" is not necessary...
i was wrong.
Bye.
You have two options. First, convert the string to Unicode and use code
like the following:
replacements = [(u''\xe9'', ''e''), ...]
def remove_accents(u):
for a, b in replacements:
u = u.replace(a, b)
return u
remove_accents(u''\xe9'') u''e''
Second, if you are using a single-byte encoding (iso8859-1, for
instance), then work with byte string:
replacement_map = string.maketrans(''\xe9...'', ''e...'')
def remove_accents(s):
return s.translate(replacement_map)
remove_accents(''\xe9'')
''e''
If you want to have strings like u''é'' in your programs, you have to
include a line at the top of the source file that tells Python the
encoding, like the following line does:
# -*- coding: utf-8 -*-
(except you have to name the encoding your editor uses, if it''s not
utf-8) See http://python.org/peps/pep-0263.html
Once you''ve done that, you can write
replacements = [(u''é'', ''e''), ...]
instead of using the \xXX escape for it.
Jeff
Jeff Epler wrote:
You have two options. First, convert the string to Unicode and use code
like the following:
replacements = [(u''\xe9'', ''e''), ...]
def remove_accents(u):
for a, b in replacements:
u = u.replace(a, b)
return uremove_accents(u''\xe9'')
u''e''
Second, if you are using a single-byte encoding (iso8859-1, for
instance), then work with byte string:
replacement_map = string.maketrans(''\xe9...'', ''e...'')
def remove_accents(s):
return s.translate(replacement_map)
remove_accents(''\xe9'')
''e''
If you want to have strings like u''é'' in your programs, you have to
include a line at the top of the source file that tells Python the
encoding, like the following line does:
# -*- coding: utf-8 -*-
(except you have to name the encoding your editor uses, if it''s not
utf-8) See http://python.org/peps/pep-0263.html
Once you''ve done that, you can write
replacements = [(u''é'', ''e''), ...]
instead of using the \xXX escape for it.
Translating the replacements pairs into a dictionary would result in a
significant speedup for large numbers of replacements.
mapping = dict(replacement_pairs)
def multi_replace(inp, mapping=mapping):
return u''''.join([mapping.get(i, i) for i in inp])
One pass through the file gives an O(len(inp)) algorithm, much better
(running-time wise) than the string.replace method that runs in
O(len(inp) * len(replacement_pairs)) time as given.
- Josiah
这篇关于将重音字符替换为不重音字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!