在python中使用pandas在列上映射匹配的单词数 [英] mapping matching word count on a column using pandas in python
问题描述
我有一个df,
Name Step Description
Ram 1 Ram is oNe of the good cricketer
Ram 2 gopal one
Sri 1 Sri is one of the member
Sri 2 ravi good
Kumar 1 Kumar is a keeper
Madhu 1 good boy
Vignesh 1 oNe little
Pechi 1 one book
mario 1 good randokm
Roger 1 one milita good
bala 1 looks good
raj 1 more one
venk 1 likes good
和一个列表
my_list=["one","good"]
我正在尝试从my_list中获取具有至少一个关键字的行.
I am trying to get the rows which are having atleast one keyword from my_list.
我尝试过, mask = df ["Description"].str.contains("|" .join(my_list),na = False) 我得到了output_df,
I tried, mask=df["Description"].str.contains("|".join(my_list),na=False) I am getting the output_df,
Name Description
Ram Ram is one of the good cricketer
Sri Sri is one of the member
我还想将说明"中存在的关键字及其计数添加到单独的列中,
I also want to add the keywords present in the "Description" and its counts in a separate columns,
即使df ["Name"]不是第一次出现,说明"中也包含关键字,也不应将关键字复制到我的期望"列中,
Even the "Description" contains a keyword when the df["Name"] is not a first time occureance it should not copy the keyword in keys column My desired output is,
my_desired输出是
my_desired output is,
Name Step Description keys count
Ram 1 Ram is one of the good cricketer one,good 2
Ram 2 gopal one
Sri 1 Sri is one of the member one 1
Sri 2 ravi good
Kumar 1 Kumar is a keeper
Madhu 1 good boy good 1
Vignesh 1 oNe little oNe 1
Pechi 1 one book one 1
mario 1 good randokm good good 1
Roger 1 one milita good one,good 2
bala 1 looks good good 1
raj 1 more one one 1
venk 1 likes good good 1
推荐答案
创建新蒙版并应用它:
my_list=["one","good"]
mask=df["Description"].str.contains("|".join(my_list),na=False,flags=re.IGNORECASE ) & \
(df.groupby('Name').cumcount() == 0)
print (mask)
0 True
1 False
2 True
3 False
4 False
5 True
6 True
7 True
8 True
9 True
10 True
11 True
12 True
dtype: bool
extracted = df['Description'].str.findall('(' + '|'.join(my_list) + ')', flags=re.IGNORECASE)
df.loc[mask, 'keys'] = extracted.str.join(',')
df.loc[mask, 'count'] = extracted.str.len()
print (df)
Name Step Description keys count
0 Ram 1 Ram is oNe of the good cricketer oNe,good 2.0
1 Ram 2 gopal one NaN NaN
2 Sri 1 Sri is one of the member one 1.0
3 Sri 2 ravi good NaN NaN
4 Kumar 1 Kumar is a keeper NaN NaN
5 Madhu 1 good boy good 1.0
6 Vignesh 1 oNe little oNe 1.0
7 Pechi 1 one book one 1.0
8 mario 1 good randokm good 1.0
9 Roger 1 one milita good one,good 2.0
10 bala 1 looks good good 1.0
11 raj 1 more one one 1.0
12 venk 1 likes good good 1.0
#transform all values if need same size of original
s = df.groupby('Name')['Description'].transform(','.join)
print (s)
0 Ram is oNe of the good cricketer,gopal one
1 Ram is oNe of the good cricketer,gopal one
2 Sri is one of the member,ravi good
3 Sri is one of the member,ravi good
4 Kumar is a keeper
5 good boy
6 oNe little
7 one book
8 good randokm good
9 one milita good
10 looks good
11 more one
12 likes good
Name: Description, dtype: object
#for mask use new Series s
mask=s.str.contains("|".join(my_list),na=False,flags=re.IGNORECASE ) & \
(df.groupby('Name').cumcount() == 0)
print (mask)
0 True
1 False
2 True
3 False
4 False
5 True
6 True
7 True
8 True
9 True
10 True
11 True
12 True
dtype: bool
#extract from new Series s
extracted = s.str.findall('(' + '|'.join(my_list) + ')', flags=re.IGNORECASE).apply(set)
df.loc[mask, 'keys'] = extracted.str.join(',')
df.loc[mask, 'count'] = extracted.str.len()
print (df)
Name Step Description keys count
0 Ram 1 Ram is oNe of the good cricketer good,oNe,one 3.0
1 Ram 2 gopal one NaN NaN
2 Sri 1 Sri is one of the member good,one 2.0
3 Sri 2 ravi good NaN NaN
4 Kumar 1 Kumar is a keeper NaN NaN
5 Madhu 1 good boy good 1.0
6 Vignesh 1 oNe little oNe 1.0
7 Pechi 1 one book one 1.0
8 mario 1 good randokm good good 1.0
9 Roger 1 one milita good good,one 2.0
10 bala 1 looks good good 1.0
11 raj 1 more one one 1.0
12 venk 1 likes good good 1.0
这篇关于在python中使用pandas在列上映射匹配的单词数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!