如果类别包含其他数据框中的任何元素,如何在数据框中分配类别? [英] How do I assign categories in a dataframe if they contain any element from another dataframe?

查看:65
本文介绍了如果类别包含其他数据框中的任何元素,如何在数据框中分配类别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个Excel工作表.一个包含摘要,另一个包含具有潜在过滤词的类别.如果第二个数据帧中有任何元素匹配,我需要为第一个数据帧分配类别.

I have two excel sheets. One contains summaries and the other contains categories with potential filter words. I need to assign categories to the first dataframe if any element matches in the second dataframe.

我试图扩展第二个数据框中的列表,并通过将术语与第一个数据框中的任何单词匹配来进行映射.

I have attempted to expand the list in the second dataframe and map by matching the terms to any words in the first dataframe.

import pandas as pd

data1 = {'Bucket':['basket', 'bushel', 'peck', 'box'], 'Summary':['This is a basket of red apples. They are sour.', 'We found a bushel of fruit. They are red and sweet.', 'There is a peck of pears that taste sweet. They are very green.', 'We have a box of plums. They are sour and have a great color.']}

data2 = {'Category':['Fruit', 'Color'], 'Filters':['apple, pear, plum, grape', 'red, purple, green']}

df1 = pd.DataFrame(data1)

df2 = pd.DataFrame(data2)


print(df1)

   Bucket                                            Summary
0  basket     This is a basket of red apples. They are sour.
1  bushel  We found a bushel of fruit. They are red and s...
2    peck  There is a peck of pears that taste sweet. The...
3     box  We have a box of plums. They are sour and have...


print(df2)

  Category                   Filters
0    Fruit  apple, pear, plum, grape
1    Color        red, purple, green


此脚本行将表中的类别"列转换为列表,供以后使用.

category_list =  df2['Category'].values

category_list = list(set(category_list))

尝试匹配文本.

for item in category_list:

    item = df2.loc[df2['Category'] == item]

    filter_list =  item['Filters'].values

    filter_list = list(set(filter_list))

    df1 = df1 [df1 ['Summary'].isin(filter_list)] 

我希望第一个数据框具有以逗号分隔的类别.

I want the first dataframe to have categories assigned to it separated by a comma.

结果:

Bucket      Category                                            Summary
0  basket  Fruit, Color     This is a basket of red apples. They are sour.
1  bushel         Color  We found a bushel of fruit. They are red and s...
2    peck  Fruit, Color  There is a peck of pears that taste sweet. The...
3     box         Fruit  We have a box of plums. They are sour and have...

我希望这很清楚.我已经撞了一个星期了.

I hope this is clear. I have been banging my head against it for a week now.

提前谢谢

推荐答案

使用

Use pandas.Series.str.contains to check Filters with a loop:

df2['Filters']=[key.replace(' ','') for key in df2['Filters']]
df2['Filters']=df2['Filters'].apply(lambda x : x.split(','))
Fruit=pd.DataFrame([df1['Summary'].str.contains(key) for key in df2.set_index('Category')['Filters']['Fruit']]).any()
Color=pd.DataFrame([df1['Summary'].str.contains(key) for key in df2.set_index('Category')['Filters']['Color']]).any()
print(Fruit)
print(Color)

0     True
1    False
2     True
3     True
dtype: bool 

0     True
1     True
2     True
3    False
dtype: bool


然后将 np.where Series.str.cat 获取数据框输出:

df1['Fruit']=np.where(Fruit,'Fruit','')
df1['Color']=np.where(Color,'Color','')
df1['Category']=df1['Fruit'].str.cat(df1['Color'],sep=', ')
df1=df1[['Bucket','Category','Summary']]
print(df1)


   Bucket      Category                                            Summary
0  basket  Fruit, Color     This is a basket of red apples. They are sour.
1  bushel       , Color  We found a bushel of fruit. They are red and s...
2    peck  Fruit, Color  There is a peck of pears that taste sweet. The...
3     box       Fruit,   We have a box of plums. They are sour and have...


要n个类别过滤器:

To n Category filters:

df2['Filters']=[key.replace(' ','') for key in df2['Filters']]
df2['Filters']=df2['Filters'].apply(lambda x : x.split(','))
Categories=[pd.Series(np.where(( pd.DataFrame([df1['Summary'].str.contains(key) for key in df2.set_index('Category')['Filters'][category_filter]]).any() ),category_filter,'')) for category_filter in df2['Category']]
df1['Category']=Categories[0].str.cat(Categories[1:],sep=', ')
df1=df1.reindex(columns=['Bucket','Category','Summary'])
print(df1)

   Bucket      Category                                            Summary
0  basket  Fruit, Color     This is a basket of red apples. They are sour.
1  bushel       , Color  We found a bushel of fruit. They are red and s...
2    peck  Fruit, Color  There is a peck of pears that taste sweet. The...
3     box       Fruit,   We have a box of plums. They are sour and have...

这篇关于如果类别包含其他数据框中的任何元素,如何在数据框中分配类别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆