使用数据框中的关键字来检测另一个数据框或字符串中是否存在关键字 [英] Use keywords from dataframe to detect if any present in another dataframe or string

查看:51
本文介绍了使用数据框中的关键字来检测另一个数据框或字符串中是否存在关键字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个问题:首先是...

我有一个带有类别和关键字的数据框,如下所示:

 类别关键字0水果['苹果','梨','李子','葡萄']1颜色['红色','紫色','绿色'] 

另一个这样的数据框:

 摘要0这是一篮子红苹果.他们很酸.1我们找到了一蒲式耳的水果.他们是红色的.2有一小撮梨,味道甜美.3我们有一盒李子. 

我想要这样的最终结果:

 类别摘要0水果,颜色这是一篮子红苹果.他们很酸.1种颜色我们找到了一蒲式耳的水果.他们是红色的.2水果,颜色有一小撮绿色的梨,味道甜美.3水果我们有一盒李子. 

第二个是...

我应该能够检查字符串是否包含任何关键字,如果为true,则输出适当类别的列表.

示例: sample_sentence =此行包含红梅?"

输出:

  result_list = ['color','Fruit'] 

类似但不相同.请使用此作为参考:

I have two problems: First is...

I have one dataframe with category and keywords like this:

  Category                   Keywords
0    Fruit            ['apple', 'pear', 'plum', 'grape']
1    Color            ['red', 'purple', 'green']

Another dataframe like this:

              Summary
0        This is a basket of red apples. They are sour.
1        We found a bushel of fruit. They are red.
2        There is a peck of pears that taste sweet.
3        We have a box of plums.

I want the end result like this:

      Category                                            Summary
0    Fruit, Color     This is a basket of red apples. They are sour.
1           Color     We found a bushel of fruit. They are red.
2    Fruit, Color     There is a peck of green pears that taste sweet.
3           Fruit     We have a box of plums.

Second is...

I should be able to check if a string contains any of the keywords, and if true then output a list of appropriate categories.

Example: sample_sentence = "This line contains a red plum?"

output:

result_list = ['color','Fruit']

EDIT: Its kind of similar but not same.Use this for reference: How do I assign categories in a dataframe if they contain any element from another dataframe?

EDIT2:

I also have another version of first dataframe like this:

  Category                   Filters
0    Fruit  apple, pear, plum, grape
1    Color        red, purple, green

解决方案

You can use list comprehension to achieve this:

Dataframe set-up:

df1 = pd.DataFrame({'Category': {0: 'Fruit', 1: 'Color'},
 'Keywords': {0: 'apple,pear,plum,grape', 1: 'red,purple,green'}})
df2 = pd.DataFrame({'Summary': {0: 'This is a basket of red apples. They are sour.',
  1: 'We found a bushel of fruit. They are red.',
  2: 'There is a peck of pears that taste sweet.',
  3: 'We have a box of plums.'}})
df1['Keywords'] = df1['Keywords'].str.split(',')

Code:

df2['Category'] = (df2['Summary'].str.split(' ').apply(
    lambda x: list(set([str(a) for y in 
                        x for a,b in 
                        zip(df1['Category'], df1['Keywords']) for c in 
                        b if str(c) in #Or you can use: "if str(c) == str(y)" or "if str(c).lower() == str(y).lower()"
                        str(y)]))).str.join(', '))
df2

Output:

Out[1]: 
                                          Summary      Category
0  This is a basket of red apples. They are sour.  Fruit, Color
1       We found a bushel of fruit. They are red.         Color
2      There is a peck of pears that taste sweet.         Fruit
3                         We have a box of plums.         Fruit

a, b and x iterate through rows (vertically). c and y iterate through lists within rows (horizontally). In order to start iterating through lists horizontally, you first need to iterate through rows vertically. That is why we have all of these variables (see image). You can use zip to simultaneously iterate through two or more columns of the first dataframe.

这篇关于使用数据框中的关键字来检测另一个数据框或字符串中是否存在关键字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆