根据单独的列将前几行聚合到列表中 [英] Aggregate previous rows of into lists, depending on separate column

查看:62
本文介绍了根据单独的列将前几行聚合到列表中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我之前之前问过一个类似但更简单的问题后来意识到这并不能解决我的问题.我觉得对问题的必要编辑对于简单编辑问题来说太严格了,尤其是已经有两个有效答案,所以我让它保持不变,而是问一个新的:

I asked a similar, yet simpler, question previously but realized later that this does not solve my problem. I feel that the required edits to the question are too severe for a simple edit of the question, especially with two valid answers already, so I will let it stay up and instead ask a new one:

我有以下不同客户在不同时间的交互(查看和/或购买的产品)的DataFrame:

I have the following DataFrame of interactions (products viewed and/or bought) by different customers, at different times:

import pandas as pd
rng = list(pd.date_range('2019-02-24', periods=5, freq='T')) + list(pd.date_range('2019-03-13', periods=2, freq='T')) + list(pd.date_range('2019-02-27', periods=1, freq='T'))
customers = ["c12987"]*5 + ["c89563"]*2 + ["c56733"]
articles = ["a8473", "a7631", "a1264", "a8473", "a5641", "a9813", "a7631", "a1132"]
action_type = ["viewed", "purchased", "viewed", "purchased", "viewed", "viewed", "purchased", "viewed"]

interaction_history = pd.DataFrame({'Customer_no': customers, 'Date': rng, 'Article_no': articles, "Interaction": action_type}) 
interaction_history

输出:

    Customer_no Date                Article_no  Interaction
0   c12987      2019-02-24 00:00:00 a8473       viewed
1   c12987      2019-02-24 00:01:00 a7631       purchased
2   c12987      2019-02-24 00:02:00 a1264       viewed
3   c12987      2019-02-24 00:03:00 a8473       purchased
4   c12987      2019-02-24 00:04:00 a5641       viewed
5   c89563      2019-03-13 00:00:00 a9813       viewed
6   c89563      2019-03-13 00:01:00 a7631       purchased
7   c56733      2019-02-27 00:00:00 a1132       viewed

我希望,对于每个客户和行,查看之前的文章以及购买的之前的文章.

I would like to, for each customer and row, get the previous articles viewed as well as the previous articles bought.

预期输出:

    Customer_no Date                Article_no  Interaction Prev_viewed     Prev_purchased
0   c12987      2019-02-24 00:00:00 a8473       viewed      []              []
1   c12987      2019-02-24 00:01:00 a7631       purchased   [a8473]         []
2   c12987      2019-02-24 00:02:00 a1264       viewed      [a8473]         [a7631]
3   c12987      2019-02-24 00:03:00 a8473       purchased   [a8473, a1264]  [a7631]
4   c12987      2019-02-24 00:04:00 a5641       viewed      [a8473, a1264]  [a7631, a8473]
5   c89563      2019-03-13 00:00:00 a9813       viewed      []              []
6   c89563      2019-03-13 00:01:00 a7631       purchased   [a9813]         []
7   c56733      2019-02-27 00:00:00 a1132       viewed      []              []

我意识到我可以使用像 interaction_history.apply(lambda x: my_custom_function(x), axis=1) 之类的自定义函数迭代每一行,其中 my_custom_function(x) 将针对每一行过滤整个 interaction_history 以找到匹配的 Customer_no、Interaction 和适当的日期.我也意识到这种解决方案效率非常低且非常复杂,因此希望有人有其他想法!

I realize that I could iterate over each row with a custom function like interaction_history.apply(lambda x: my_custom_function(x), axis=1) where my_custom_function(x) would, for each row, filter through the entire interaction_history to find the matching Customer_no, Interaction and appropriate dates. I also realize this solution would be highly inefficient and very complex, thus hoping that someone has any other ideas!

推荐答案

您可以创建一个函数,根据viewedpurchased 创建一个新列.这里的关键是将 -Article_no 行设为列表格式,以便您可以使用 cumsum 将每个项目累积添加到列表中,具体取决于查看或购买:

You can create a function that creates a new column doing the required according to viewed and purchased. The key here is to make the -Article_no rows in list format, so that you can use cumsum to cumulatively add each item to lists depending on viewed or purchased:

def previous(df, string):
    df['Article_no'] = df['Article_no'].str.split()
    col = 'Prev_' + string
    df[col] = (df[df['Interaction'].eq(string)].groupby('Customer_no')
               ['Article_no'].apply(lambda x: x.cumsum()))
    df[col] = df.groupby('Customer_no')[col].shift()
    df[col] = df.groupby('Customer_no')[col].ffill()
    df[col] = df[col].mask(df[col].isnull(), df[col].apply(lambda x: []))
    df['Article_no'] = df['Article_no'].str.join('')


previous(interaction_history, 'viewed')
previous(interaction_history, 'purchased')
Out[1]: 
    Customer_no Date                Article_no  Interaction Prev_viewed     Prev_purchased
0   c12987      2019-02-24 00:00:00 a8473       viewed      []              []
1   c12987      2019-02-24 00:01:00 a7631       purchased   [a8473]         []
2   c12987      2019-02-24 00:02:00 a1264       viewed      [a8473]         [a7631]
3   c12987      2019-02-24 00:03:00 a8473       purchased   [a8473, a1264]  [a7631]
4   c12987      2019-02-24 00:04:00 a5641       viewed      [a8473, a1264]  [a7631, a8473]
5   c89563      2019-03-13 00:00:00 a9813       viewed      []              []
6   c89563      2019-03-13 00:01:00 a7631       purchased   [a9813]         []
7   c56733      2019-02-27 00:00:00 a1132       viewed      []              []

这篇关于根据单独的列将前几行聚合到列表中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆