聚合先前(及时)行的有效方法 [英] Efficient way of aggregating previous(in time) rows

查看:34
本文介绍了聚合先前(及时)行的有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下不同客户在不同时间下订单的数据框:

I have the following dataframe of orders placed by different customers, at different times:

rng = list(pd.date_range('2019-02-24', periods=5, freq='T')) + list(pd.date_range('2019-03-
13', periods=2, freq='T')) + list(pd.date_range('2019-02-27', periods=1, freq='T'))
customers = ["12987"]*5 + ["89563"]*2 + ["56733"]
articles = ["8473", "7631", "1264", "8473", "5641", "9813", "7631", "1132"]

order_history = pd.DataFrame({'Customer_no': customers, 'Date': rng, 'Article_no': articles}) 
order_history

输出:

    Customer_no Date                    Article_no
0   12987       2019-02-24 00:00:00     8473
1   12987       2019-02-24 00:01:00     7631
2   12987       2019-02-24 00:02:00     1264
3   12987       2019-02-24 00:03:00     8473
4   12987       2019-02-24 00:04:00     5641
5   89563       2019-03-13 00:00:00     9813
6   89563       2019-03-13 00:01:00     7631
7   56733       2019-02-27 00:00:00     1132

我想为每个客户和行购买之前购买的文章.
预期输出:

I would like to, for each customer and row, get the previous articles bought.
Expected output:

    Customer_no Date                    Article_no  Previous_articles
0   12987       2019-02-24 00:00:00     8473        []
1   12987       2019-02-24 00:01:00     7631        [8473]
2   12987       2019-02-24 00:02:00     1264        [8473, 7631]
3   12987       2019-02-24 00:03:00     8473        [8473, 7631, 1264]
4   12987       2019-02-24 00:04:00     5641        [8473, 7631, 1264, 8473]
5   89563       2019-03-13 00:00:00     9813        []
6   89563       2019-03-13 00:01:00     7631        [9813]
7   56733       2019-02-27 00:00:00     1132        []

我意识到我可以使用像 order_history.apply(lambda x: my_custom_function(x), axis=1) 这样的自定义函数迭代每一行其中 my_custom_function(x) 将针对每一行过滤整个 order_history 以找到匹配的 Customer_no 和适当的日期.我也意识到这种解决方案效率非常低,因此希望有人有其他想法!

I realize that I could iterate over each row with a custom function like order_history.apply(lambda x: my_custom_function(x), axis=1) where my_custom_function(x) would, for each row, filter through the entire order_history to find the matching Customer_no and appropriate dates. I also realize this solution would be highly inefficient, thus hoping that someone has any other ideas!

推荐答案

输入代码:

import pandas as pd
import numpy as np

rng = list(pd.date_range('2019-02-24', periods=5, freq='T')) + list(pd.date_range('2019-03-13', periods=2, freq='T')) + list(pd.date_range('2019-02-27', periods=1, freq='T'))
customers = ["12987"]*5 + ["89563"]*2 + ["56733"]
articles = ["8473", "7631", "1264", "8473", "5641", "9813", "7631", "1132"]

order_history = pd.DataFrame({'Customer_no': customers, 'Date': rng, 'Article_no': articles}) 
order_history

获取Previous_articles的代码:

Code to get Previous_articles:

a = order_history['Article_no'].apply(lambda x: [] if pd.isnull(x) else [int(x)])
order_history['Previous_articles'] = a.groupby(order_history['Customer_no']).apply(lambda x: x.cumsum())

order_history["Previous_articles"] = order_history["Previous_articles"].apply(lambda x:x[:-1])
order_history

输出:

  Customer_no                Date Article_no         Previous_articles
0       12987 2019-02-24 00:00:00       8473                        []
1       12987 2019-02-24 00:01:00       7631                    [8473]
2       12987 2019-02-24 00:02:00       1264              [8473, 7631]
3       12987 2019-02-24 00:03:00       8473        [8473, 7631, 1264]
4       12987 2019-02-24 00:04:00       5641  [8473, 7631, 1264, 8473]
5       89563 2019-03-13 00:00:00       9813                        []
6       89563 2019-03-13 00:01:00       7631                    [9813]
7       56733 2019-02-27 00:00:00       1132                        []

这篇关于聚合先前(及时)行的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆