Python - 从列表中搜索数据框中的字符串 [英] Python - Searching a string within a dataframe from a list
问题描述
我有以下列表:
search_list = ['STEEL','IRON','GOLD','SILVER']
我需要在数据框 (df) 中搜索:
which I need to search within a dataframe (df):
a b
0 123 'Blah Blah Steel'
1 456 'Blah Blah Blah'
2 789 'Blah Blah Gold'
并将匹配的行插入到新的数据框 (newdf) 中,添加一个包含列表中匹配单词的新列:
and insert the matching rows into a new dataframe (newdf), adding a new column with the matching word from the list:
a b c
0 123 'Blah Blah Steel' 'STEEL'
1 789 'Blah Blah Gold' 'GOLD'
我可以使用以下代码提取匹配的行:
I can use the following code to extract the matching row:
newdf=df[df['b'].str.upper().str.contains('|'.join(search_list),na=False)]
但我不知道如何将列表中的匹配词添加到 c 列中.
but I can't figure out how to add the matching word from the list into column c.
我认为匹配以某种方式需要捕获列表中匹配单词的索引,然后使用索引号提取值,但我不知道如何执行此操作.
I'm thinking that the match somehow needs to capture the index of the matching word in the list and then pull the value using the index number but I can't figure out how to do this.
任何帮助或指示将不胜感激
Any help or pointers would be greatly appreciated
谢谢
推荐答案
你可以使用 extract 并过滤掉那些nan
(即不匹配):
You could use extract and filter out those that are nan
(i.e. no match):
search_list = ['STEEL','IRON','GOLD','SILVER']
df['c'] = df.b.str.extract('({0})'.format('|'.join(search_list)), flags=re.IGNORECASE)
result = df[~pd.isna(df.c)]
print(result)
输出
a b c
123 'Blah Blah Steel' Steel
789 'Blah Blah Gold' Gold
请注意,您必须导入 re 模块才能使用 re.IGNORECASE
标志.作为替代方案,您可以直接使用 2
,即 re.IGNORECASE
标志的值.
Note that you have to import the re module in order to use the re.IGNORECASE
flag. As an alternative you could use 2
directly that is the value of the re.IGNORECASE
flag.
更新
如@user3483203 所述,您可以使用以下方法保存导入:
As mentioned by @user3483203 you can save the import by using:
df['c'] = df.b.str.extract('(?i)({0})'.format('|'.join(search_list)))
这篇关于Python - 从列表中搜索数据框中的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!