根据描述场景和Python Pandas中的状态过滤产品 [英] Filtering products based on description scenarios and status in Python Pandas
问题描述
比方说,我在Pandas DataFrame中具有以下产品描述.我希望保留满足以下条件的产品的所有产品描述:
Let's say I have the following product descriptions in a Pandas DataFrame. I would like to keep all product descriptions of products that satisfy the following condition:
对于
product_descriptions
中的每个id
,请检查其中是否包含所有descriptions
scenario_descriptions
中至少有1种情况.如果是这样,请检查所有说明 这些情况有status
4、5或6.
For every
id
inproduct_descriptions
, check if it contains alldescriptions
from at least 1 scenario inscenario_descriptions
. If so, check if the descriptions of all these scenarios havestatus
4, 5 or 6.
输入
scenario_descriptions = [
['world1', 'world2', 'world3'], #scenario1
['world4', 'world2'], #scenario2
['world5'], #scenario3
['world6', 'world7'], #scenario4
['world6', 'world2'] #scenario5
]
product_descriptions =
id | description | status
-------------------------
1 | world1 | 1
1 | world2 | 4
1 | world3 | 1
1 | world4 | 4
1 | world5 | 4
1 | world6 | 4
1 | world7 | 1
1 | world8 | 4
1 | world9 | 4
1 | world10 | 4
1 | world11 | 4
1 | world12 | 4
1 | world13 | 4
1 | world14 | 4
1 | world15 | 1
2 | world1 | 1
2 | world2 | 1
2 | world3 | 5
2 | world15 | 6
2 | world8 | 6
2 | world4 | 5
2 | world7 | 5
ID == 1的产品不被排除,原因是:
Product with id == 1 is not excluded because:
- 它具有
world2 and
world4`(场景2),其状态为4 - 具有状态为4的
world5
(方案3) - 具有状态为4的
world6
和world2
(方案5)
- It has
world2 and
world4` (scenario2) and their statuses are 4 - It has
world5
(scenario3) with status 4 - It has
world6
andworld2
(scenario5) with status 4
id == 2的产品被排除在外是因为:
Product with id == 2 is excluded because:
-
world1
(方案1)的状态为4、5或6 -
world2
(方案2)的状态为4、5或6 - 它没有
world5
(场景3) - 它没有
world6
(场景4) - 它没有
world6
(场景5),并且world2
的状态不是4、5或6
world1
(scenario1) doesn't have status 4, 5 or 6world2
(scenario2) doesn't have status 4, 5 or 6- It doesn't have
world5
(scenario3) - It doesn't have
world6
(scenario4) - It doesn't have
world6
(scenario5) and status ofworld2
is not 4, 5 or 6
输出
filtered_product_descriptions =
id | description | status
-------------------------
1 | world1 | 1
1 | world2 | 4
1 | world3 | 1
1 | world4 | 4
1 | world5 | 4
1 | world6 | 4
1 | world7 | 1
1 | world8 | 4
1 | world9 | 4
1 | world10 | 4
1 | world11 | 4
1 | world12 | 4
1 | world13 | 4
1 | world14 | 4
1 | world15 | 1
有什么办法解决这个问题吗?我已经尝试了好几天,但没有成功:( 下面的代码是我能得到的最接近的代码.这解决了我问题的第一部分.我不知道该如何结合第二个条件:
Any idea how to solve this? I have been trying for days, but no success :( The code below is the closest I could get. This solves the first section of my problem. I don't know how to combine this with the second condition:
如果是,请检查所有这些方案的描述是否具有状态4, 5或6.
If so, check if the descriptions of all these scenarios have status 4, 5 or 6.
filtered_product_descriptions = (product_descriptions.groupby('id').filter(lambda x: (pd.Series([(pd.Series(y).isin(x['description']).all()) for y in scenario_descriptions])).any()))
推荐答案
使用:
#create dictionary by scenaries
d = {'scenario{}'.format(k):v for k, v in enumerate(scenario_descriptions, 1)}
#unique id for reindex
uniq_id = df['id'].unique()
def f(x):
#check if all description
c = set(x['description']) >= set(v)
#check if 4,5 or 6 value
d = x['status'].isin([4,5,6]).all()
return (c & d)
d1 = {}
for k, v in d.items():
#filter df by scenary first for remove not relevant rows
a = df[df['description'].isin(v)]
#call groupby with custom function
b = a.groupby('id').apply(f)
#add missing ids and fill by False
#output to dictionary
d1[k] = b.reindex(uniq_id, fill_value=False)
print (d1)
{'scenario1': id
1 False
2 False
dtype: bool, 'scenario4': id
1 False
2 False
dtype: bool, 'scenario5': id
1 True
2 False
dtype: bool, 'scenario3': id
1 True
2 False
dtype: bool, 'scenario2': id
1 True
2 False
dtype: bool}
#reduce dict to DataFrame and check at least one True per row
m = pd.concat(d1, axis=1).any(axis=1)
print (m)
id
1 True
2 False
#last filtering
df = df[df['id'].isin(m.index[m])]
print (df)
id description status
0 1 world1 1
1 1 world2 4
2 1 world3 1
3 1 world4 4
4 1 world5 4
5 1 world6 4
6 1 world7 1
7 1 world8 4
8 1 world9 4
9 1 world10 4
10 1 world11 4
11 1 world12 4
12 1 world13 4
13 1 world14 4
14 1 world15 1
这篇关于根据描述场景和Python Pandas中的状态过滤产品的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!