根据单独的列将前几行聚合到列表中 [英] Aggregate previous rows of into lists, depending on separate column
问题描述
我之前之前问过一个类似但更简单的问题后来意识到这并不能解决我的问题.我觉得对问题的必要编辑对于简单编辑问题来说太严格了,尤其是已经有两个有效答案,所以我让它保持不变,而是问一个新的:
I asked a similar, yet simpler, question previously but realized later that this does not solve my problem. I feel that the required edits to the question are too severe for a simple edit of the question, especially with two valid answers already, so I will let it stay up and instead ask a new one:
我有以下不同客户在不同时间的交互(查看和/或购买的产品)的DataFrame
:
I have the following DataFrame
of interactions (products viewed and/or bought) by different customers, at different times:
import pandas as pd
rng = list(pd.date_range('2019-02-24', periods=5, freq='T')) + list(pd.date_range('2019-03-13', periods=2, freq='T')) + list(pd.date_range('2019-02-27', periods=1, freq='T'))
customers = ["c12987"]*5 + ["c89563"]*2 + ["c56733"]
articles = ["a8473", "a7631", "a1264", "a8473", "a5641", "a9813", "a7631", "a1132"]
action_type = ["viewed", "purchased", "viewed", "purchased", "viewed", "viewed", "purchased", "viewed"]
interaction_history = pd.DataFrame({'Customer_no': customers, 'Date': rng, 'Article_no': articles, "Interaction": action_type})
interaction_history
输出:
Customer_no Date Article_no Interaction
0 c12987 2019-02-24 00:00:00 a8473 viewed
1 c12987 2019-02-24 00:01:00 a7631 purchased
2 c12987 2019-02-24 00:02:00 a1264 viewed
3 c12987 2019-02-24 00:03:00 a8473 purchased
4 c12987 2019-02-24 00:04:00 a5641 viewed
5 c89563 2019-03-13 00:00:00 a9813 viewed
6 c89563 2019-03-13 00:01:00 a7631 purchased
7 c56733 2019-02-27 00:00:00 a1132 viewed
我希望,对于每个客户和行,查看之前的文章以及购买的之前的文章.
I would like to, for each customer and row, get the previous articles viewed as well as the previous articles bought.
预期输出:
Customer_no Date Article_no Interaction Prev_viewed Prev_purchased
0 c12987 2019-02-24 00:00:00 a8473 viewed [] []
1 c12987 2019-02-24 00:01:00 a7631 purchased [a8473] []
2 c12987 2019-02-24 00:02:00 a1264 viewed [a8473] [a7631]
3 c12987 2019-02-24 00:03:00 a8473 purchased [a8473, a1264] [a7631]
4 c12987 2019-02-24 00:04:00 a5641 viewed [a8473, a1264] [a7631, a8473]
5 c89563 2019-03-13 00:00:00 a9813 viewed [] []
6 c89563 2019-03-13 00:01:00 a7631 purchased [a9813] []
7 c56733 2019-02-27 00:00:00 a1132 viewed [] []
我意识到我可以使用像 interaction_history.apply(lambda x: my_custom_function(x), axis=1)
之类的自定义函数迭代每一行,其中 my_custom_function(x)
将针对每一行过滤整个 interaction_history
以找到匹配的 Customer_no、Interaction 和适当的日期.我也意识到这种解决方案效率非常低且非常复杂,因此希望有人有其他想法!
I realize that I could iterate over each row with a custom function like interaction_history.apply(lambda x: my_custom_function(x), axis=1)
where my_custom_function(x)
would, for each row, filter through the entire interaction_history
to find the matching Customer_no, Interaction and appropriate dates. I also realize this solution would be highly inefficient and very complex, thus hoping that someone has any other ideas!
推荐答案
您可以创建一个函数,根据viewed
和purchased
创建一个新列.这里的关键是将 -Article_no
行设为列表格式,以便您可以使用 cumsum
将每个项目累积添加到列表中,具体取决于查看或购买代码>:
You can create a function that creates a new column doing the required according to viewed
and purchased
. The key here is to make the -Article_no
rows in list format, so that you can use cumsum
to cumulatively add each item to lists depending on viewed or purchased
:
def previous(df, string):
df['Article_no'] = df['Article_no'].str.split()
col = 'Prev_' + string
df[col] = (df[df['Interaction'].eq(string)].groupby('Customer_no')
['Article_no'].apply(lambda x: x.cumsum()))
df[col] = df.groupby('Customer_no')[col].shift()
df[col] = df.groupby('Customer_no')[col].ffill()
df[col] = df[col].mask(df[col].isnull(), df[col].apply(lambda x: []))
df['Article_no'] = df['Article_no'].str.join('')
previous(interaction_history, 'viewed')
previous(interaction_history, 'purchased')
Out[1]:
Customer_no Date Article_no Interaction Prev_viewed Prev_purchased
0 c12987 2019-02-24 00:00:00 a8473 viewed [] []
1 c12987 2019-02-24 00:01:00 a7631 purchased [a8473] []
2 c12987 2019-02-24 00:02:00 a1264 viewed [a8473] [a7631]
3 c12987 2019-02-24 00:03:00 a8473 purchased [a8473, a1264] [a7631]
4 c12987 2019-02-24 00:04:00 a5641 viewed [a8473, a1264] [a7631, a8473]
5 c89563 2019-03-13 00:00:00 a9813 viewed [] []
6 c89563 2019-03-13 00:01:00 a7631 purchased [a9813] []
7 c56733 2019-02-27 00:00:00 a1132 viewed [] []
这篇关于根据单独的列将前几行聚合到列表中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!