根据描述场景和Python Pandas中的状态过滤产品 [英] Filtering products based on description scenarios and status in Python Pandas

查看：83 发布时间：2020/5/24 3:38:51 python pandas dataframe filter

本文介绍了根据描述场景和Python Pandas中的状态过滤产品的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

比方说，我在Pandas DataFrame中具有以下产品描述.我希望保留满足以下条件的产品的所有产品描述:

Let's say I have the following product descriptions in a Pandas DataFrame. I would like to keep all product descriptions of products that satisfy the following condition:

对于product_descriptions中的每个id，请检查其中是否包含所有descriptions scenario_descriptions中至少有1种情况.如果是这样，请检查所有说明这些情况有status 4、5或6.

For every id in product_descriptions, check if it contains all descriptions from at least 1 scenario in scenario_descriptions. If so, check if the descriptions of all these scenarios have status 4, 5 or 6.

输入

scenario_descriptions = [
    ['world1', 'world2', 'world3'],  #scenario1
    ['world4', 'world2'],            #scenario2
    ['world5'],                      #scenario3
    ['world6', 'world7'],            #scenario4
    ['world6', 'world2']             #scenario5
]

product_descriptions = 

id | description | status
-------------------------
1  | world1      | 1
1  | world2      | 4
1  | world3      | 1
1  | world4      | 4
1  | world5      | 4
1  | world6      | 4
1  | world7      | 1
1  | world8      | 4
1  | world9      | 4
1  | world10     | 4
1  | world11     | 4
1  | world12     | 4
1  | world13     | 4
1  | world14     | 4
1  | world15     | 1
2  | world1      | 1
2  | world2      | 1
2  | world3      | 5
2  | world15     | 6
2  | world8      | 6
2  | world4      | 5
2  | world7      | 5

ID == 1的产品不被排除，原因是:

Product with id == 1 is not excluded because:

它具有world2 and world4`(场景2)，其状态为4
具有状态为4的world5(方案3)
具有状态为4的world6和world2(方案5)

It has world2 andworld4` (scenario2) and their statuses are 4
It has world5 (scenario3) with status 4
It has world6 and world2 (scenario5) with status 4

id == 2的产品被排除在外是因为:

Product with id == 2 is excluded because:

world1(方案1)的状态为4、5或6
world2(方案2)的状态为4、5或6
它没有world5(场景3)
它没有world6(场景4)
它没有world6(场景5)，并且world2的状态不是4、5或6

world1 (scenario1) doesn't have status 4, 5 or 6
world2 (scenario2) doesn't have status 4, 5 or 6
It doesn't have world5 (scenario3)
It doesn't have world6 (scenario4)
It doesn't have world6 (scenario5) and status of world2 is not 4, 5 or 6

输出

filtered_product_descriptions = 

id | description | status
-------------------------
1  | world1      | 1
1  | world2      | 4
1  | world3      | 1
1  | world4      | 4
1  | world5      | 4
1  | world6      | 4
1  | world7      | 1
1  | world8      | 4
1  | world9      | 4
1  | world10     | 4
1  | world11     | 4
1  | world12     | 4
1  | world13     | 4
1  | world14     | 4
1  | world15     | 1

有什么办法解决这个问题吗?我已经尝试了好几天，但没有成功:( 下面的代码是我能得到的最接近的代码.这解决了我问题的第一部分.我不知道该如何结合第二个条件:

Any idea how to solve this? I have been trying for days, but no success :( The code below is the closest I could get. This solves the first section of my problem. I don't know how to combine this with the second condition:

如果是，请检查所有这些方案的描述是否具有状态4， 5或6.

If so, check if the descriptions of all these scenarios have status 4, 5 or 6.

filtered_product_descriptions = (product_descriptions.groupby('id').filter(lambda x: (pd.Series([(pd.Series(y).isin(x['description']).all()) for y in scenario_descriptions])).any()))

推荐答案

使用:

#create dictionary by scenaries
d = {'scenario{}'.format(k):v for k, v in enumerate(scenario_descriptions, 1)}

#unique id for reindex
uniq_id = df['id'].unique()

def f(x):
    #check if all description
    c = set(x['description']) >= set(v)
    #check if 4,5 or 6 value
    d = x['status'].isin([4,5,6]).all()
    return (c & d)

d1 = {}
for k, v in d.items():
     #filter df by scenary first for remove not relevant rows
     a = df[df['description'].isin(v)]
     #call groupby with custom function 
     b = a.groupby('id').apply(f)
     #add missing ids and fill by False
     #output to dictionary 
     d1[k] = b.reindex(uniq_id, fill_value=False)

print (d1)
{'scenario1': id
1    False
2    False
dtype: bool, 'scenario4': id
1    False
2    False
dtype: bool, 'scenario5': id
1     True
2    False
dtype: bool, 'scenario3': id
1     True
2    False
dtype: bool, 'scenario2': id
1     True
2    False
dtype: bool}

#reduce dict to DataFrame and check at least one True per row
m = pd.concat(d1, axis=1).any(axis=1)
print (m)
id
1     True
2    False

#last filtering
df = df[df['id'].isin(m.index[m])]
print (df)
    id description  status
0    1      world1       1
1    1      world2       4
2    1      world3       1
3    1      world4       4
4    1      world5       4
5    1      world6       4
6    1      world7       1
7    1      world8       4
8    1      world9       4
9    1     world10       4
10   1     world11       4
11   1     world12       4
12   1     world13       4
13   1     world14       4
14   1     world15       1

这篇关于根据描述场景和Python Pandas中的状态过滤产品的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据描述场景和Python Pandas中的状态过滤产品 [英] Filtering products based on description scenarios and status in Python Pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

根据描述场景和Python Pandas中的状态过滤产品 [英] Filtering products based on description scenarios and status in Python Pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭