整行中的pandas数据框搜索字符串 [英] pandas dataframe search string in the entire row
问题描述
我有一个如下的pandas数据框.我想在数据框的每一行中搜索一个文本,并突出显示该文本是否出现在该行中.
I have a pandas dataframe like below. I want to search a text in each row of the dataframe and highlight if that text appears in the row.
例如,我想在每一行中搜索琼斯".我想忽略搜索词的大小写.在以下情况下,我想向名为"jones"的数据添加一个新列,它的值将为1,1,0,因为在第一行和第二行中都找到了该单词
For example, I want to search each row for "jones". I want to ignore the case of my search word. In the below case, I would like to add a new column to data called "jones" and it would have values 1,1,0 as that word was found in 1st and 2nd row
我发现了这篇帖子,其中显示了如何查找文本列中的内容,但是当我有很多列时(例如50+),如何查找文本?我考虑过串联所有列并创建一个新列,但没有看到任何可以串联数据框所有列的功能(无需询问每个列名)
I found this post which shows how to find a text in a column, but how could I find a text when I have many columns - say 50+? I thought about concatenating all the columns and creating a new column, but didn't see any function that would concatenate all columns of a dataframe (without asking to type each column name)
我想对多个关键字执行此操作.例如,我有一个关键字LLC, Co, Blue, alpha
的列表以及更多(30个以上)
I would like to do this for multiple keywords that I have. For example I have list of keyword LLC, Co, Blue, alpha
and many more (30+)
sales = [{'account': 'Jones LLC', 'Jan': '150', 'Feb': '200', 'Mar': '140'},
{'account': 'Alpha Co', 'Jan': 'Jones', 'Feb': '210', 'Mar': '215'},
{'account': 'Blue Inc', 'Jan': '50', 'Feb': '90', 'Mar': '95' }]
df = pd.DataFrame(sales)
源DF:
Feb Jan Mar account
0 200 150 140 Jones LLC
1 210 Jones 215 Alpha Co
2 90 50 95 Blue Inc
所需DF:
Feb Jan Mar account jones llc co blue alpha
0 200 150 140 Jones LLC 1 1 0 0 0
1 210 Jones 215 Alpha Co 1 0 1 0 1
2 90 50 95 Blue Inc 0 0 0 1 0
推荐答案
在这里,我们将熊猫内置的str
函数contains
与apply
一起使用,然后将它们与any
一起使用,如下所示:
Here we use pandas built-in str
function contains
, along with apply
and then bring it all together with any
as follows,
search_string = 'Jones'
df[search_string] = (df.apply(lambda x: x.str.contains(search_string))
.any(axis=1).astype(int))
df
Out[2]:
Feb Jan Mar account Jones
0 200 150 140 Jones LLC 1
1 210 Jones 215 Alpha Co 1
2 90 50 95 Blue Inc 0
这很容易扩展,因为contains
使用正则表达式进行匹配.它还有一个大小写arg,因此您可以使其不区分大小写,并同时搜索Jones
和jones
.
This can be easily extended as contains
uses regular expressions to do the matching. It also has a case arg so that you can make it case-insensitive and search for both Jones
and jones
.
为了遍历搜索词列表,我们需要进行以下更改.通过将每个搜索结果(一个Series
)存储在一个列表中,我们可以使用该列表将系列添加到DataFrame
中.我们这样做是因为我们不想在新列中搜索新的search_string,
In order to loop over a list of search words we need to make the following changes. By storing each search result (a Series
) in a list, we use the list to join the series together in to DataFrame
. We do this because we don't want to search new columns for the new search_string,
df_list = []
for search_string in ['Jones', 'Co', 'Alpha']:
#use above method but rename the series instead of setting to
# a columns. The append to a list.
df_list.append(df.apply(lambda x: x.str.contains(search_string))
.any(axis=1)
.astype(int)
.rename(search_string))
#concatenate the list of series into a DataFrame with the original df
df = pd.concat([df] + df_list, axis=1)
df
Out[5]:
Feb Jan Mar account Jones Co Alpha
0 200 150 140 Jones LLC 1 0 0
1 210 Jones 215 Alpha Co 1 1 1
2 90 50 95 Blue Inc 0 0 0
这篇关于整行中的pandas数据框搜索字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!