在python中使用 pandas 检索数据列上的匹配单词数 [英] Retrieving matching word count on a datacolumn using pandas in python

查看:291
本文介绍了在python中使用 pandas 检索数据列上的匹配单词数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个df,

Name      Description
Ram Ram   is one of the good cricketer
Sri Sri   is one of the member
Kumar     Kumar is a keeper

和一个列表, my_list = [一个",好",拉维",球"]

and a list, my_list=["one","good","ravi","ball"]

我正在尝试从my_list中获取具有至少一个关键字的行.

I am trying to get the rows which are having atleast one keyword from my_list.

我尝试过

  mask=df["Description"].str.contains("|".join(my_list),na=False)

我正在获取output_df,

I am getting the output_df,

Name    Description
Ram     Ram is one of ONe crickete
Sri     Sri is one of the member
Ravi    Ravi is a player, ravi is playing
Kumar   there is a BALL

我还想将说明"中存在的关键字及其计数添加到单独的列中,

I also want to add the keywords present in the "Description" and its counts in a separate columns,

我想要的输出是

Name    Description                      pre-keys          keys     count
Ram     Ram is one of ONe crickete         one,good,ONe   one,good    2
Sri     Sri is one of the member           one            one         1
Ravi    Ravi is a player, ravi is playing  Ravi,ravi      ravi        1
Kumar   there is a BALL                    ball           ball        1

推荐答案

使用 str.join +

import re
my_list=["ONE","good"]

extracted = df['Description'].str.findall('(' + '|'.join(my_list) + ')', flags=re.IGNORECASE)
df['keys'] = extracted.str.join(',')
df['count'] = extracted.str.len()
print (df)
  Name                       Description      keys  count
0  Ram  Ram is one of the good cricketer  one,good      2
1  Sri          Sri is one of the member       one      1

这篇关于在python中使用 pandas 检索数据列上的匹配单词数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆