Python在数据框架中搜索列表中的单词,并跟踪找到的单词和频率 [英] Python-searching data frame for words in a list and keep track of words found AND frequency

查看:57
本文介绍了Python在数据框架中搜索列表中的单词,并跟踪找到的单词和频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经参考了以下帖子,它对我们很有帮助,但是我需要更进一步.

I have referenced the following post and it was extremely helpful, but I need to take it a step further. Python - Searching a string within a dataframe from a list

I would like to not only search my data frame for a list of words, but also keep track of if multiple words are found and the frequency. So, using the example from the above post:

If this is my search list

search_list = ['STEEL','IRON','GOLD','SILVER']

and this is the data frame I am searching in

      a    b             
0    123   'Blah Blah Steel'
1    456   'Blah Blah Blah Steel Gold'
2    789   'Blah Blah Gold'
3    790   'Blah Blah blah'

I want my output to be

      a    b                        c               d
0    123   'Blah Blah Steel'      'STEEL'           1
1    789   'Blah Blah Steel Gold' 'STEEL','GOLD'    2
2    789   'Blah Blah Gold'       'GOLD'            1
3    790   'Blah Blah blah'

How may I expand on the awesome solutions in the above mentioned post to get this desired output? I am currently utilizing the top voted answer as a starting place.

I am more concerned with being able to tag multiple words from the list. I have not found any way to do this yet. I can apply string counting functions to the data frame to create a frequency column if these is no way to do that in this step. If there is a way to do it all in one step though that would be good to know as well.

Thanks in advance!

解决方案

You can use re.findall() instead of extract() to do what you need.

import re

search_list = ['STEEL','IRON','GOLD','SILVER']

df['c'] = df.b.str.findall('({0})'.format('|'.join(search_list)), flags=re.IGNORECASE)
df['d'] = df['c'].str.len()

This output looks like this:

这篇关于Python在数据框架中搜索列表中的单词,并跟踪找到的单词和频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆