在python中使用 pandas 检索数据列上的匹配单词数 [英] Retrieving matching word count on a datacolumn using pandas in python
本文介绍了在python中使用 pandas 检索数据列上的匹配单词数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个df,
Name Description
Ram Ram is one of the good cricketer
Sri Sri is one of the member
Kumar Kumar is a keeper
和一个列表, my_list = [一个",好",拉维",球"]
and a list, my_list=["one","good","ravi","ball"]
我正在尝试从my_list中获取具有至少一个关键字的行.
I am trying to get the rows which are having atleast one keyword from my_list.
我尝试过
mask=df["Description"].str.contains("|".join(my_list),na=False)
我正在获取output_df,
I am getting the output_df,
Name Description
Ram Ram is one of ONe crickete
Sri Sri is one of the member
Ravi Ravi is a player, ravi is playing
Kumar there is a BALL
我还想将说明"中存在的关键字及其计数添加到单独的列中,
I also want to add the keywords present in the "Description" and its counts in a separate columns,
我想要的输出是
Name Description pre-keys keys count
Ram Ram is one of ONe crickete one,good,ONe one,good 2
Sri Sri is one of the member one one 1
Ravi Ravi is a player, ravi is playing Ravi,ravi ravi 1
Kumar there is a BALL ball ball 1
推荐答案
使用 str.join
+
import re
my_list=["ONE","good"]
extracted = df['Description'].str.findall('(' + '|'.join(my_list) + ')', flags=re.IGNORECASE)
df['keys'] = extracted.str.join(',')
df['count'] = extracted.str.len()
print (df)
Name Description keys count
0 Ram Ram is one of the good cricketer one,good 2
1 Sri Sri is one of the member one 1
这篇关于在python中使用 pandas 检索数据列上的匹配单词数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文