根据描述场景和Python Pandas中的状态过滤产品 [英] Filtering products based on description scenarios and status in Python Pandas

查看:83
本文介绍了根据描述场景和Python Pandas中的状态过滤产品的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

比方说,我在Pandas DataFrame中具有以下产品描述.我希望保留满足以下条件的产品的所有产品描述:

Let's say I have the following product descriptions in a Pandas DataFrame. I would like to keep all product descriptions of products that satisfy the following condition:

对于product_descriptions中的每个id,请检查其中是否包含所有descriptions scenario_descriptions中至少有1种情况.如果是这样,请检查所有说明 这些情况有status 4、5或6.

For every id in product_descriptions, check if it contains all descriptions from at least 1 scenario in scenario_descriptions. If so, check if the descriptions of all these scenarios have status 4, 5 or 6.

输入

scenario_descriptions = [
    ['world1', 'world2', 'world3'],  #scenario1
    ['world4', 'world2'],            #scenario2
    ['world5'],                      #scenario3
    ['world6', 'world7'],            #scenario4
    ['world6', 'world2']             #scenario5
]

product_descriptions = 

id | description | status
-------------------------
1  | world1      | 1
1  | world2      | 4
1  | world3      | 1
1  | world4      | 4
1  | world5      | 4
1  | world6      | 4
1  | world7      | 1
1  | world8      | 4
1  | world9      | 4
1  | world10     | 4
1  | world11     | 4
1  | world12     | 4
1  | world13     | 4
1  | world14     | 4
1  | world15     | 1
2  | world1      | 1
2  | world2      | 1
2  | world3      | 5
2  | world15     | 6
2  | world8      | 6
2  | world4      | 5
2  | world7      | 5

ID == 1的产品不被排除,原因是:

Product with id == 1 is not excluded because:

  • 它具有world2 and world4`(场景2),其状态为4
  • 具有状态为4的world5(方案3)
  • 具有状态为4的world6world2(方案5)
  • It has world2 andworld4` (scenario2) and their statuses are 4
  • It has world5 (scenario3) with status 4
  • It has world6 and world2 (scenario5) with status 4

id == 2的产品被排除在外是因为:

Product with id == 2 is excluded because:

  • world1(方案1)的状态为4、5或6
  • world2(方案2)的状态为4、5或6
  • 它没有world5(场景3)
  • 它没有world6(场景4)
  • 它没有world6(场景5),并且world2的状态不是4、5或6
  • world1 (scenario1) doesn't have status 4, 5 or 6
  • world2 (scenario2) doesn't have status 4, 5 or 6
  • It doesn't have world5 (scenario3)
  • It doesn't have world6 (scenario4)
  • It doesn't have world6 (scenario5) and status of world2 is not 4, 5 or 6

输出

filtered_product_descriptions = 

id | description | status
-------------------------
1  | world1      | 1
1  | world2      | 4
1  | world3      | 1
1  | world4      | 4
1  | world5      | 4
1  | world6      | 4
1  | world7      | 1
1  | world8      | 4
1  | world9      | 4
1  | world10     | 4
1  | world11     | 4
1  | world12     | 4
1  | world13     | 4
1  | world14     | 4
1  | world15     | 1

有什么办法解决这个问题吗?我已经尝试了好几天,但没有成功:( 下面的代码是我能得到的最接近的代码.这解决了我问题的第一部分.我不知道该如何结合第二个条件:

Any idea how to solve this? I have been trying for days, but no success :( The code below is the closest I could get. This solves the first section of my problem. I don't know how to combine this with the second condition:

如果是,请检查所有这些方案的描述是否具有状态4, 5或6.

If so, check if the descriptions of all these scenarios have status 4, 5 or 6.

filtered_product_descriptions = (product_descriptions.groupby('id').filter(lambda x: (pd.Series([(pd.Series(y).isin(x['description']).all()) for y in scenario_descriptions])).any()))

推荐答案

使用:

#create dictionary by scenaries
d = {'scenario{}'.format(k):v for k, v in enumerate(scenario_descriptions, 1)}

#unique id for reindex
uniq_id = df['id'].unique()

def f(x):
    #check if all description
    c = set(x['description']) >= set(v)
    #check if 4,5 or 6 value
    d = x['status'].isin([4,5,6]).all()
    return (c & d)

d1 = {}
for k, v in d.items():
     #filter df by scenary first for remove not relevant rows
     a = df[df['description'].isin(v)]
     #call groupby with custom function 
     b = a.groupby('id').apply(f)
     #add missing ids and fill by False
     #output to dictionary 
     d1[k] = b.reindex(uniq_id, fill_value=False)


print (d1)
{'scenario1': id
1    False
2    False
dtype: bool, 'scenario4': id
1    False
2    False
dtype: bool, 'scenario5': id
1     True
2    False
dtype: bool, 'scenario3': id
1     True
2    False
dtype: bool, 'scenario2': id
1     True
2    False
dtype: bool}

#reduce dict to DataFrame and check at least one True per row
m = pd.concat(d1, axis=1).any(axis=1)
print (m)
id
1     True
2    False


#last filtering
df = df[df['id'].isin(m.index[m])]
print (df)
    id description  status
0    1      world1       1
1    1      world2       4
2    1      world3       1
3    1      world4       4
4    1      world5       4
5    1      world6       4
6    1      world7       1
7    1      world8       4
8    1      world9       4
9    1     world10       4
10   1     world11       4
11   1     world12       4
12   1     world13       4
13   1     world14       4
14   1     world15       1

这篇关于根据描述场景和Python Pandas中的状态过滤产品的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆