如何根据包含特定值的行(在任何列中)过滤数据框 [英] How can one filter a dataframe based on rows containing specific value (in any of the columns)
问题描述
我需要限制数据集,使其仅返回包含特定字符串的行,但是,该字符串可以存在于许多(8)列中.
I need to limit a dataset so that it returns only rows that contain specific string, however, that string can exist in many (8) of the columns.
我该怎么做?我已经看过str.isin方法,但是它为单行返回了一个系列.如何删除任何一列中包含字符串的任何行.
How can I do this? Ive seen str.isin methods, but it returns a single series for a single row. How can I remove any rows that contain the string in ANY of the columns.
示例代码 如果我有由
import pandas as pd
data = {'year': [2011, 2012, 2013, 2014, 2014, 2011, 2012, 2015],
'year2': [2012, 2016, 2015, 2015, 2012, 2013, 2019, 2016],
'reports': [52, 20, 43, 33, 41, 11, 43, 72]}
df = pd.DataFrame(data, index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
df
year year2 reports
a 2011 2012 52
b 2012 2016 20
c 2013 2015 43
d 2014 2015 33
e 2014 2012 41
f 2011 2013 11
g 2012 2019 43
h 2015 2016 72
我希望代码删除所有不包含值2012的行.请注意,在我的实际数据集中,它是一个字符串,而不是一个int(它是人的名字)
因此,在上面的代码中,它将删除c, d, f, and h.
I want the code to remove rows all rows that do not contain the value 2012. Note that in my actual dataset, it is a string, not an int (it is peoples names)
so in the above code it would remove rows c, d, f, and h.
推荐答案
,您可以使用 df.any
在axis=1
上:
df[df.eq('2012').any(1)] #for year as string
或者:
df[df.eq(2012).any(1)] #for year as int
year year2 reports
a 2011 2012 52
b 2012 2016 20
e 2014 2012 41
g 2012 2019 43
这篇关于如何根据包含特定值的行(在任何列中)过滤数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!