Python在数据框架中搜索列表中的单词,并跟踪找到的单词和频率 [英] Python-searching data frame for words in a list and keep track of words found AND frequency
问题描述
我已经参考了以下帖子,它对我们很有帮助,但是我需要更进一步.
I have referenced the following post and it was extremely helpful, but I need to take it a step further. Python - Searching a string within a dataframe from a list
I would like to not only search my data frame for a list of words, but also keep track of if multiple words are found and the frequency. So, using the example from the above post:
If this is my search list
search_list = ['STEEL','IRON','GOLD','SILVER']
and this is the data frame I am searching in
a b
0 123 'Blah Blah Steel'
1 456 'Blah Blah Blah Steel Gold'
2 789 'Blah Blah Gold'
3 790 'Blah Blah blah'
I want my output to be
a b c d
0 123 'Blah Blah Steel' 'STEEL' 1
1 789 'Blah Blah Steel Gold' 'STEEL','GOLD' 2
2 789 'Blah Blah Gold' 'GOLD' 1
3 790 'Blah Blah blah'
How may I expand on the awesome solutions in the above mentioned post to get this desired output? I am currently utilizing the top voted answer as a starting place.
I am more concerned with being able to tag multiple words from the list. I have not found any way to do this yet. I can apply string counting functions to the data frame to create a frequency column if these is no way to do that in this step. If there is a way to do it all in one step though that would be good to know as well.
Thanks in advance!
You can use re.findall() instead of extract() to do what you need.
import re
search_list = ['STEEL','IRON','GOLD','SILVER']
df['c'] = df.b.str.findall('({0})'.format('|'.join(search_list)), flags=re.IGNORECASE)
df['d'] = df['c'].str.len()
This output looks like this:
这篇关于Python在数据框架中搜索列表中的单词,并跟踪找到的单词和频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!