如何在给定范围内添加Python表中“特定天数"的值? [英] How to add the values for Specific days in Python Table for a given range?
问题描述
我有一个数据集(Product_ID,date_time,Solded),其中包含在不同日期出售的产品.给出的日期为9个月,一个月内随机出现13天或更长时间.我必须按以下方式隔离数据:每个产品每天1-3天售出多少产品,每天4-7个给定天售出,每天8-15个给定天售出以及每天售出> 16天.那么我该如何使用pandas和其他软件包在python中进行编码
I have a dataset (Product_ID,date_time, Sold) which has products sold on various dates. The dates are being given for 9 months with random 13 days or more from a month. I have to segregate the data in a such a way that the for each product how many products were sold daily 1-3 days, sold daily 4-7 given days, sold daily 8-15 given days and sold daily for >16 days. So how can I code this in python using pandas and other packages
PRODUCT_ID DATE_LOCATION Sold
0E4234 01-08-16 0:00 2
0E4234 02-08-16 0:00 7
0E4234 07-08-16 0:00 3
0E4234 08-08-16 0:00 1
0E4234 09-08-16 0:00 2
0E4234 10-08-16 0.00 1
.
.
.
0G2342 22-08-16 0:00 1
0G2342 23-08-16 0:00 2
0G2342 26-08-16 0:00 1
0G2342 28-08-16 0:00 1
0G2342 29-08-16 0:00 3
0G2342 30-08-16 0:00 3
.
.
.(goes for 64 products each with 9 months of data)
.
我什至不知道如何在python中为此编写代码 所需的输出是
I don't know even how to code for this in python The output needed is
PRODUCT_ID Days Sold
0E4234 1-3 9 #(1,2) dates because range is 1 to 3
4-7 7 #(7,8,9,10) dates because range is 4 to 7
8-15 0
>16 0
0G2342 1-3 11 #(22,23),(26),(28,29,30) dates because range is 1 to 3
4-7 0
8-15 0
>16 0
.
.(for 64 products)
.
如果至少有人发布了从哪里开始的链接,那将是很高兴的. 我尝试过
Would be happy if at least someone posted a link to where to start. I tried
df["DATE_LOCATION"] = pd.to_datetime(df.DATE_LOCATION)
df["DAY"] = df.DATE_LOCATION.dt.day
def flag(x):
if 1<=x<=3:
return '1-3'
elif 4<=x<=7:
return '4-7'
elif 8<=x<=15:
return '8-15'
else:
return '>=16'
df["Days"] = df.DAY.apply(flag)
df.groupby(["PRODUCT_ID","Days"]).Sold.sum()
这给了我每个月这两天之间售出的产品的数量.但是,我需要指定范围内的产品总和,如果产品以指定的条纹销售.
This gave me the number of products sold between these days in each month.But I need the sum of the products for the specified range were the products are sold in a streak specified.
推荐答案
Use transform
for Series
with same size as original DataFrame
, binning with cut
and aggregate sum
:
df['DATE_LOCATION'] = pd.to_datetime(df['DATE_LOCATION'], format='%d-%m-%y %H:%M')
df = df.sort_values("DATE_LOCATION")
s = (df["DATE_LOCATION"].diff().dt.days > 1).cumsum()
count = s.groupby(s).transform('size')
print (count)
0 2
1 2
2 4
3 4
4 4
5 4
6 2
7 2
8 1
9 3
10 3
11 3
Name: DATE_LOCATION, dtype: int32
bins = pd.cut(count, bins=[0,3,7,15,31], labels=['1-3', '4-7','8-15', '>=16'])
df = df.groupby(['PRODUCT_ID', bins])['Sold'].sum().reset_index()
print (df)
PRODUCT_ID DATE_LOCATION Sold
0 0E4234 1-3 9
1 0E4234 4-7 7
2 0G2342 1-3 11
这篇关于如何在给定范围内添加Python表中“特定天数"的值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!