使用数据框中的关键字来检测另一个数据框或字符串中是否存在关键字 [英] Use keywords from dataframe to detect if any present in another dataframe or string
问题描述
我有两个问题:首先是...
我有一个带有类别和关键字的数据框,如下所示:
类别关键字0水果['苹果','梨','李子','葡萄']1颜色['红色','紫色','绿色']
另一个这样的数据框:
摘要0这是一篮子红苹果.他们很酸.1我们找到了一蒲式耳的水果.他们是红色的.2有一小撮梨,味道甜美.3我们有一盒李子.
我想要这样的最终结果:
类别摘要0水果,颜色这是一篮子红苹果.他们很酸.1种颜色我们找到了一蒲式耳的水果.他们是红色的.2水果,颜色有一小撮绿色的梨,味道甜美.3水果我们有一盒李子.
第二个是...
我应该能够检查字符串是否包含任何关键字,如果为true,则输出适当类别的列表.
示例: sample_sentence =此行包含红梅?"
输出:
result_list = ['color','Fruit']
类似但不相同.请使用此作为参考:
I have two problems: First is...
I have one dataframe with category and keywords like this:
Category Keywords
0 Fruit ['apple', 'pear', 'plum', 'grape']
1 Color ['red', 'purple', 'green']
Another dataframe like this:
Summary
0 This is a basket of red apples. They are sour.
1 We found a bushel of fruit. They are red.
2 There is a peck of pears that taste sweet.
3 We have a box of plums.
I want the end result like this:
Category Summary
0 Fruit, Color This is a basket of red apples. They are sour.
1 Color We found a bushel of fruit. They are red.
2 Fruit, Color There is a peck of green pears that taste sweet.
3 Fruit We have a box of plums.
Second is...
I should be able to check if a string contains any of the keywords, and if true then output a list of appropriate categories.
Example: sample_sentence = "This line contains a red plum?"
output:
result_list = ['color','Fruit']
EDIT: Its kind of similar but not same.Use this for reference: How do I assign categories in a dataframe if they contain any element from another dataframe?
EDIT2:
I also have another version of first dataframe like this:
Category Filters
0 Fruit apple, pear, plum, grape
1 Color red, purple, green
You can use list comprehension to achieve this:
Dataframe set-up:
df1 = pd.DataFrame({'Category': {0: 'Fruit', 1: 'Color'},
'Keywords': {0: 'apple,pear,plum,grape', 1: 'red,purple,green'}})
df2 = pd.DataFrame({'Summary': {0: 'This is a basket of red apples. They are sour.',
1: 'We found a bushel of fruit. They are red.',
2: 'There is a peck of pears that taste sweet.',
3: 'We have a box of plums.'}})
df1['Keywords'] = df1['Keywords'].str.split(',')
Code:
df2['Category'] = (df2['Summary'].str.split(' ').apply(
lambda x: list(set([str(a) for y in
x for a,b in
zip(df1['Category'], df1['Keywords']) for c in
b if str(c) in #Or you can use: "if str(c) == str(y)" or "if str(c).lower() == str(y).lower()"
str(y)]))).str.join(', '))
df2
Output:
Out[1]:
Summary Category
0 This is a basket of red apples. They are sour. Fruit, Color
1 We found a bushel of fruit. They are red. Color
2 There is a peck of pears that taste sweet. Fruit
3 We have a box of plums. Fruit
a
, b
and x
iterate through rows
(vertically). c
and y
iterate through lists within rows (horizontally). In order to start iterating through lists horizontally, you first need to iterate through rows vertically. That is why we have all of these variables (see image). You can use zip
to simultaneously iterate through two or more columns of the first dataframe.
这篇关于使用数据框中的关键字来检测另一个数据框或字符串中是否存在关键字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!