将重音字符替换为不重音字符 [英] Replace accented chars with unaccented ones

查看：96 发布时间：2019/6/5 11:01:24 python

本文介绍了将重音字符替换为不重音字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

嗨

我想用非
accetued ones（?? - >" e"，"？¨" - >" e"，"？*" - >" a" ）。

我尝试过string.replace方法，但似乎不喜欢非ascii字符...

你能帮帮我吗？

谢谢。

解决方案

谢谢你们的回答。他们的工作非常好。

首先，我相信我不工作，因为我所犯的错误是

忘记了" U" for string：u"é"。因为我的文件已经是utf-8

编码（＃ - * - 编码：UTF-8 - * - ），我认为u没有必要......

i错了。

再见。

你有两个选择。首先，将字符串转换为Unicode并使用代码

，如下所示：

替换= [（u''\xe9''，''e ''），...]

def remove_accents（u）：

代替a，b代替：

u = u.replace（ a，b）

返回u

remove_accents（u''\xe9 ''）
u'''

其次，如果您使用的是单字节编码（iso8859-1，用于

实例），然后使用字节字符串：

replacement_map = string.maketrans（''\ xe9 ...''，''e ...''）

def remove_accents（s）：

返回s.translate（replacement_map）

remove_accents（''\ xe9''）

''e''

如果你想在程序中使用u''é''这样的字符串，你必须

在t的顶部包含一行他的源文件告诉Python

编码，如下所示：

＃ - * - 编码：utf-8 - * -

（除非您必须命名编辑器使用的编码，如果它不是

utf-8）请参阅 http://python.org/peps/pep-0263.html

一旦你有了完成后，你可以写

替换= [（u''é''，''e''），...]

而不是使用\ xXX为它逃脱。

Jeff

Jeff Epler写道：

你有两个选择。首先，将字符串转换为Unicode并使用如下代码：

替换= [（u''\xe9''，''e''），...]
def remove_accents（u）：
for a，b in replacements：
u = u.replace（a，b）
return u

< blockquote class =post_quotes>
remove_accents（u''\xe9''）
u''''

其次，如果你正在使用单字节编码（iso8859-1，用于
实例），然后使用字节字符串：
replacement_map = string.maketrans（''\ xe9 ...''，''e .. 。''）
def remove_accents（s）：
返回s.translate（replacement_map）

remove_accents（''\ xe9''）

''''

如果你想在节目中加入像你这样的字符串，你必须在在源文件的顶部告诉Python
编码，喜欢以下几行：
＃ - * - 编码：utf-8 - * -
（除非你必须命名你的编辑器使用的编码，如果它不是
utf- 8）参见 http://python.org/peps/pep-0263.html

一旦你完成了，你可以写
替换= [（u''é''，''e''），...] <而不是使用\ xXX转义。

将替换对转换为字典会导致

显着大量替换的加速。

mapping = dict（replacement_pairs）

def multi_replace（inp，mapping = mapping）：

返回u''''。join（[mapping.get（i，i）for in in inp]）

一次通过文件给出一个O（ len（inp））算法，比运行在

O（len（inp）* len（replacement_）中的string.replace方法好得多b $ b（运行时间明智）对））给出的时间。

- Josiah

Hi

I would like to replace accentuel chars (like "??", "?¨" or "?*") with non
accetued ones ("??" -> "e", "?¨" -> "e", "?*" -> "a").

I have tried string.replace method, but it seems dislike non ascii chars...

Can you help me please ?
Thanks.

解决方案

Thank you both for your answer. They works well both very good.

First, i believe i doesn''t work, because the error i''ve made is to
forgot the "u" for string : u"é". Because my file was already utf-8
encoded (# -*- coding: UTF-8 -*-), i thinks the "u" is not necessary...
i was wrong.

Bye.

You have two options. First, convert the string to Unicode and use code
like the following:

replacements = [(u''\xe9'', ''e''), ...]
def remove_accents(u):
for a, b in replacements:
u = u.replace(a, b)
return u

remove_accents(u''\xe9'') u''e''

Second, if you are using a single-byte encoding (iso8859-1, for
instance), then work with byte string:
replacement_map = string.maketrans(''\xe9...'', ''e...'')
def remove_accents(s):
return s.translate(replacement_map)
remove_accents(''\xe9'')

''e''

If you want to have strings like u''é'' in your programs, you have to
include a line at the top of the source file that tells Python the
encoding, like the following line does:
# -*- coding: utf-8 -*-
(except you have to name the encoding your editor uses, if it''s not
utf-8) See http://python.org/peps/pep-0263.html

Once you''ve done that, you can write
replacements = [(u''é'', ''e''), ...]
instead of using the \xXX escape for it.

Jeff

Jeff Epler wrote:

You have two options. First, convert the string to Unicode and use code
like the following:

replacements = [(u''\xe9'', ''e''), ...]
def remove_accents(u):
for a, b in replacements:
u = u.replace(a, b)
return u

remove_accents(u''\xe9'')
u''e''

Second, if you are using a single-byte encoding (iso8859-1, for
instance), then work with byte string:
replacement_map = string.maketrans(''\xe9...'', ''e...'')
def remove_accents(s):
return s.translate(replacement_map)

remove_accents(''\xe9'')

''e''

If you want to have strings like u''é'' in your programs, you have to
include a line at the top of the source file that tells Python the
encoding, like the following line does:
# -*- coding: utf-8 -*-
(except you have to name the encoding your editor uses, if it''s not
utf-8) See http://python.org/peps/pep-0263.html

Once you''ve done that, you can write
replacements = [(u''é'', ''e''), ...]
instead of using the \xXX escape for it.

Translating the replacements pairs into a dictionary would result in a
significant speedup for large numbers of replacements.

mapping = dict(replacement_pairs)

def multi_replace(inp, mapping=mapping):
return u''''.join([mapping.get(i, i) for i in inp])

One pass through the file gives an O(len(inp)) algorithm, much better
(running-time wise) than the string.replace method that runs in
O(len(inp) * len(replacement_pairs)) time as given.

- Josiah

这篇关于将重音字符替换为不重音字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将重音字符替换为不重音字符 [英] Replace accented chars with unaccented ones

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将重音字符替换为不重音字符 [英] Replace accented chars with unaccented ones

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭