Pandas 数据框通过查找子字符串替换多列中的字符串 [英] Pandas dataframe replace string in multiple columns by finding substring

查看:111
本文介绍了Pandas 数据框通过查找子字符串替换多列中的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的 Pandas 数据框,其中包含字符串和整数列.我想在整个数据框中搜索特定子字符串,如果找到,则用其他内容替换完整字符串.

I have a very large pandas data frame containing both string and integer columns. I'd like to search the whole data frame for a specific substring, and if found, replace the full string with something else.

我发现了一些示例通过指定要搜索的列来执行此操作,如下所示:

I've found some examples that do this by specifying the column(s) to search, like this:

df = pd.DataFrame([[1,'A'], [2,'(B,D,E)'], [3,'C']],columns=['Question','Answer'])
df.loc[df['Answer'].str.contains(','), 'Answer'] = 'X'

但是因为我的数据框有几十个没有特定顺序的字符串列,所以我不想全部指定它们.据我所知,使用 df.replace 将不起作用,因为我只是在搜索子字符串.感谢您的帮助!

But because my data frame has dozens of string columns in no particular order, I don't want to specify them all. As far as I can tell using df.replace will not work since I'm only searching for a substring. Thanks for your help!

推荐答案

您可以使用数据框replace方法和regex=True,并使用.*,.* 匹配包含逗号的字符串(您可以将 comma 替换为您想要检测的其他任何其他子字符串):

You can use data frame replace method with regex=True, and use .*,.* to match strings that contain a comma (you can replace comma with other any other substring you want to detect):

str_cols = ['Answer']    # specify columns you want to replace
df[str_cols] = df[str_cols].replace('.*,.*', 'X', regex=True)
df
#Question   Answer
#0      1       A
#1      2       X
#2      3       C

或者如果您想替换所有字符串列:

or if you want to replace all string columns:

str_cols = df.select_dtypes(['object']).columns

这篇关于Pandas 数据框通过查找子字符串替换多列中的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆