Pandas 数据框通过查找子字符串替换多列中的字符串 [英] Pandas dataframe replace string in multiple columns by finding substring
问题描述
我有一个非常大的 Pandas 数据框,其中包含字符串和整数列.我想在整个数据框中搜索特定子字符串,如果找到,则用其他内容替换完整字符串.
I have a very large pandas data frame containing both string and integer columns. I'd like to search the whole data frame for a specific substring, and if found, replace the full string with something else.
我发现了一些示例通过指定要搜索的列来执行此操作,如下所示:
I've found some examples that do this by specifying the column(s) to search, like this:
df = pd.DataFrame([[1,'A'], [2,'(B,D,E)'], [3,'C']],columns=['Question','Answer'])
df.loc[df['Answer'].str.contains(','), 'Answer'] = 'X'
但是因为我的数据框有几十个没有特定顺序的字符串列,所以我不想全部指定它们.据我所知,使用 df.replace
将不起作用,因为我只是在搜索子字符串.感谢您的帮助!
But because my data frame has dozens of string columns in no particular order, I don't want to specify them all. As far as I can tell using df.replace
will not work since I'm only searching for a substring. Thanks for your help!
推荐答案
您可以使用数据框replace
方法和regex=True
,并使用.*,.*
匹配包含逗号的字符串(您可以将 comma 替换为您想要检测的其他任何其他子字符串):
You can use data frame replace
method with regex=True
, and use .*,.*
to match strings that contain a comma (you can replace comma with other any other substring you want to detect):
str_cols = ['Answer'] # specify columns you want to replace
df[str_cols] = df[str_cols].replace('.*,.*', 'X', regex=True)
df
#Question Answer
#0 1 A
#1 2 X
#2 3 C
或者如果您想替换所有字符串列:
or if you want to replace all string columns:
str_cols = df.select_dtypes(['object']).columns
这篇关于Pandas 数据框通过查找子字符串替换多列中的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!