无法替换 Python pandas 数据框中的特殊字符 [英] Cannot replace special characters in a Python pandas dataframe

查看:83
本文介绍了无法替换 Python pandas 数据框中的特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 Windows 中使用 Python 3.5.我有一个数据框,其中 'titles' str 类型列包含标题的标题,其中一些具有特殊字符,例如 â.

I'm working with Python 3.5 in Windows. I have a dataframe where a 'titles' str type column contains titles of headlines, some of which have special characters such as â,,˜.

我正在尝试使用 pandas.replace 将这些替换为空格 ''.我尝试了各种迭代,但没有任何效果.我可以替换常规字符,但这些特殊字符似乎不起作用.

I am trying to replace these with a space '' using pandas.replace. I have tried various iterations and nothing works. I am able to replace regular characters, but these special characters just don't seem to work.

代码运行没有错误,但根本没有发生替换,而是返回了原始标题.以下是我已经尝试过的.任何建议将不胜感激.

The code runs without error, but the replacement simply does not occur, and instead the original title is returned. Below is what I have tried already. Any advice would be much appreciated.

df['clean_title'] = df['titles'].replace('€','',regex=True)
df['clean_titles'] = df['titles'].replace('€','')
df['clean_titles'] = df['titles'].str.replace('€','')

def clean_text(row):
   return re.sub('€','',str(row))
   return str(row).replace('€','')
df['clean_title'] = df['titles'].apply(clean_text)

推荐答案

我们只能假设您将非 ASCI 称为特殊"字符.

We can only assume that you refer to non-ASCI as 'special' characters.

要删除熊猫数据框列中的所有非 ASCI 字符,请执行以下操作:

To remove all non-ASCI characters in a pandas dataframe column, do the following:

df['clean_titles'] = df['titles'].str.replace(r'[^\x00-\x7f]', '')

请注意,这是一个可扩展的解决方案,因为它适用于任何非 ASCI 字符.

Note that this is a scalable solution as it works for any non-ASCI char.

这篇关于无法替换 Python pandas 数据框中的特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆