如果列中的任何行包含特定字符串,请选择列 [英] Select columns if any of their rows contain a certain string
问题描述
如果列中的任何值包含字符串,则我尝试获取DataFrame中列的列表.例如,在下面的数据框中,我想要一个在字符串中具有%的列的列表.我可以使用for循环和series.str.contains方法来完成此操作,但似乎不是最佳方法,尤其是对于较大的数据集.有没有更有效的方法可以做到这一点?
I am trying to obtain a list of columns in a DataFrame if any value in a column contains a string. For example in the below dataframe I would like a list of columns that have the % in the string. I am able to accomplish this using a for loop and the series.str.contains method but doens't seem optimal especially with a larger dataset. Is there a more efficient way to do this?
import pandas as pd
df = pd.DataFrame({'A': {0: '2019-06-01', 1: '2019-06-01', 2: '2019-06-01'},
'B': {0: '10', 1: '20', 2: '30'},
'C': {0: '10', 1: '20%', 2: '30%'},
'D': {0: '10%', 1: '20%', 2: '30'},
})
DataFrame
A B C D
0 2019-06-01 10 10 10%
1 2019-06-01 20 20% 20%
2 2019-06-01 30 30% 30
当前方法
col_list = []
for col in df.columns:
if (True in list(df[col].str.contains('%'))) is True:
col_list.append(col)
输出
['C', 'D']
推荐答案
First use DataFrame.select_dtypes
for filter only object columns, obviously string columns.
然后使用 DataFrame.applymap
使用 DataFrame.any
如果每列至少有一个,则返回True,因此可能会过滤列:
Then use DataFrame.applymap
for elementwise check values with DataFrame.any
for return True if at least one per column, so possible filter columns:
c = df.columns[df.select_dtypes(object).applymap(lambda x: '%' in str(x)).any()].tolist()
print (c)
['C', 'D']
或使用 Series.str.contains
每列,如果所有字符串列都应省略na
参数:
f = lambda x: x.str.contains('%', na=False)
c = df.columns[df.select_dtypes(object).apply(f).any()].tolist()
print (c)
['C', 'D']
这篇关于如果列中的任何行包含特定字符串,请选择列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!