如何在 pandas 数据框的列中替换口音 [英] How to replace accents in a column of a pandas dataframe

查看：53 发布时间：2021/4/29 20:40:40 python string pandas unicode decode

本文介绍了如何在 pandas 数据框的列中替换口音的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据框 dataSwiss ，其中包含瑞士市政信息.我想用带普通字母的重音代替字母.

I have a dataframe dataSwiss which contains the information Swiss municipalities. I want to replace the letter with accents with normal letter.

这就是我在做什么:

dataSwiss['Municipality'] = dataSwiss['Municipality'].str.encode('utf-8')
dataSwiss['Municipality'] = dataSwiss['Municipality'].str.replace(u"é", "e")

但出现以下错误:

----> 2 dataSwiss['Municipality'] = dataSwiss['Municipality'].str.replace(u"é", "e")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

数据如下:

dataSwiss.Municipality
0               Zürich
1               Zürich
2               Zürich
3               Zürich
4               Zürich
5               Zürich
6               Zürich
7               Zürich

我找到了解决方法

s = dataSwiss['Municipality']
res = s.str.decode('utf-8')
res = res.str.replace(u"é", "e")

推荐答案

这是一种方法.您可以先将其转换为字节文字，然后再解码为utf-8.

This is one way. You can convert to byte literal first before decoding to utf-8.

s = pd.Series(['hello', 'héllo', 'Zürich', 'Zurich'])

res = s.str.normalize('NFKD')\
       .str.encode('ascii', errors='ignore')\
       .str.decode('utf-8')

print(res)

0     hello
1     hello
2    Zurich
3    Zurich
dtype: object

pd.Series.str.normalize 使用 unicodedata 模块.根据文档:

正常形式的KD(NFKD)将应用兼容性分解，也就是说，将所有兼容字符替换为它们的等效字符.

The normal form KD (NFKD) will apply the compatibility decomposition, i.e. replace all compatibility characters with their equivalents.

这篇关于如何在 pandas 数据框的列中替换口音的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在 pandas 数据框的列中替换口音 [英] How to replace accents in a column of a pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在 pandas 数据框的列中替换口音 [英] How to replace accents in a column of a pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭