聚合先前(及时)行的有效方法 [英] Efficient way of aggregating previous(in time) rows
问题描述
我有以下不同客户在不同时间下订单的数据框:
I have the following dataframe of orders placed by different customers, at different times:
rng = list(pd.date_range('2019-02-24', periods=5, freq='T')) + list(pd.date_range('2019-03-
13', periods=2, freq='T')) + list(pd.date_range('2019-02-27', periods=1, freq='T'))
customers = ["12987"]*5 + ["89563"]*2 + ["56733"]
articles = ["8473", "7631", "1264", "8473", "5641", "9813", "7631", "1132"]
order_history = pd.DataFrame({'Customer_no': customers, 'Date': rng, 'Article_no': articles})
order_history
输出:
Customer_no Date Article_no
0 12987 2019-02-24 00:00:00 8473
1 12987 2019-02-24 00:01:00 7631
2 12987 2019-02-24 00:02:00 1264
3 12987 2019-02-24 00:03:00 8473
4 12987 2019-02-24 00:04:00 5641
5 89563 2019-03-13 00:00:00 9813
6 89563 2019-03-13 00:01:00 7631
7 56733 2019-02-27 00:00:00 1132
我想为每个客户和行购买之前购买的文章.
预期输出:
I would like to, for each customer and row, get the previous articles bought.
Expected output:
Customer_no Date Article_no Previous_articles
0 12987 2019-02-24 00:00:00 8473 []
1 12987 2019-02-24 00:01:00 7631 [8473]
2 12987 2019-02-24 00:02:00 1264 [8473, 7631]
3 12987 2019-02-24 00:03:00 8473 [8473, 7631, 1264]
4 12987 2019-02-24 00:04:00 5641 [8473, 7631, 1264, 8473]
5 89563 2019-03-13 00:00:00 9813 []
6 89563 2019-03-13 00:01:00 7631 [9813]
7 56733 2019-02-27 00:00:00 1132 []
我意识到我可以使用像 order_history.apply(lambda x: my_custom_function(x), axis=1)
这样的自定义函数迭代每一行其中 my_custom_function(x)
将针对每一行过滤整个 order_history
以找到匹配的 Customer_no
和适当的日期.我也意识到这种解决方案效率非常低,因此希望有人有其他想法!
I realize that I could iterate over each row with a custom function like order_history.apply(lambda x: my_custom_function(x), axis=1)
where my_custom_function(x)
would, for each row, filter through the entire order_history
to find the matching Customer_no
and appropriate dates. I also realize this solution would be highly inefficient, thus hoping that someone has any other ideas!
推荐答案
输入代码:
import pandas as pd
import numpy as np
rng = list(pd.date_range('2019-02-24', periods=5, freq='T')) + list(pd.date_range('2019-03-13', periods=2, freq='T')) + list(pd.date_range('2019-02-27', periods=1, freq='T'))
customers = ["12987"]*5 + ["89563"]*2 + ["56733"]
articles = ["8473", "7631", "1264", "8473", "5641", "9813", "7631", "1132"]
order_history = pd.DataFrame({'Customer_no': customers, 'Date': rng, 'Article_no': articles})
order_history
获取Previous_articles
的代码:
Code to get Previous_articles
:
a = order_history['Article_no'].apply(lambda x: [] if pd.isnull(x) else [int(x)])
order_history['Previous_articles'] = a.groupby(order_history['Customer_no']).apply(lambda x: x.cumsum())
order_history["Previous_articles"] = order_history["Previous_articles"].apply(lambda x:x[:-1])
order_history
输出:
Customer_no Date Article_no Previous_articles
0 12987 2019-02-24 00:00:00 8473 []
1 12987 2019-02-24 00:01:00 7631 [8473]
2 12987 2019-02-24 00:02:00 1264 [8473, 7631]
3 12987 2019-02-24 00:03:00 8473 [8473, 7631, 1264]
4 12987 2019-02-24 00:04:00 5641 [8473, 7631, 1264, 8473]
5 89563 2019-03-13 00:00:00 9813 []
6 89563 2019-03-13 00:01:00 7631 [9813]
7 56733 2019-02-27 00:00:00 1132 []
这篇关于聚合先前(及时)行的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!