在python表中添加特定日期 [英] Adding specific days in python table
问题描述
我有一个数据集(Product_ID,date_time,Solded),其中包含在不同日期出售的产品.日期不一致,为期9个月,一个月内随机出现13天或更长时间.我必须以这样一种方式隔离数据,即每个产品在1-3给定的天数,4-7给定的天数,8-15给定的天数和> 16给定的天数里售出了多少产品. .那么我该如何使用pandas和其他软件包在python中进行编码
I have a dataset (Product_ID,date_time, Sold) which has products sold on various dates. The dates are not consistent and are given for 9 months with random 13 days or more from a month. I have to segregate the data in a such a way that the for each product how many products were sold on 1-3 given days, 4-7 given days, 8-15 given days and >16 given days. . So how can I code this in python using pandas and other packages
PRODUCT_ID DATE_LOCATION Sold
0E4234 01-08-16 0:00 2
0E4234 02-08-16 0:00 7
0E4234 04-08-16 0:00 3
0E4234 08-08-16 0:00 1
0E4234 09-08-16 0:00 2
.
. (same product for 9 months sold data)
.
0G2342 02-08-16 0:00 1
0G2342 03-08-16 0:00 2
0G2342 06-08-16 0:00 1
0G2342 09-08-16 0:00 1
0G2342 11-08-16 0:00 3
0G2342 15-08-16 0:00 3
.
.
.(goes for 64 products each with 9 months of data)
.
PRODUCT_ID DATE_LOCATION Sold
0E4234 01-08-16 0:00 2
0E4234 02-08-16 0:00 7
0E4234 04-08-16 0:00 3
0E4234 08-08-16 0:00 1
0E4234 09-08-16 0:00 2
.
. (same product for 9 months sold data)
.
0G2342 02-08-16 0:00 1
0G2342 03-08-16 0:00 2
0G2342 06-08-16 0:00 1
0G2342 09-08-16 0:00 1
0G2342 11-08-16 0:00 3
0G2342 15-08-16 0:00 3
.
.
.(goes for 64 products each with 9 months of data)
.
我什至不知道如何在python中为此编写代码 所需的输出是
I don't know even how to code for this in python The output needed is
PRODUCT_ID Days Sold
0E4234 1-3 9
4-7 3
8-15 16
>16 (remaing values sum)
0G2342 1-3 3
4-7 1
8-15 7
>16 (remaing values sum)
.
.(for 64 products)
.
如果至少有人发布了从哪里开始的链接,会很高兴
Would be happy if at least someone posted a link to where to start
推荐答案
You can first convert dates to dtetimes and get days by dt.day
:
df['DATE_LOCATION'] = pd.to_datetime(df['DATE_LOCATION'], dayfirst=True)
days = df['DATE_LOCATION'].dt.day
然后通过 cut
:
Then binning by cut
:
rng = pd.cut(days, bins=[0,3,7,15,31], labels=['1-3', '4-7','8-15', '>=16'])
print (rng)
0 1-3
1 1-3
2 4-7
3 8-15
4 8-15
5 1-3
6 1-3
7 4-7
8 8-15
9 8-15
10 8-15
Name: DATE_LOCATION, dtype: category
Categories (4, object): [1-3 < 4-7 < 8-15 < >=16]
并按产品汇总sum
并分类为Series
:
And aggregate sum
by product and binned Series
:
df = df.groupby(["PRODUCT_ID",rng])['Sold'].sum()
print (df)
PRODUCT_ID DATE_LOCATION
0E4234 1-3 9
4-7 3
8-15 3
0G2342 1-3 3
4-7 1
8-15 7
Name: Sold, dtype: int64
如果还需要按year
s计数:
If need also count per year
s:
df = df.groupby([df['DATE_LOCATION'].dt.year.rename('YEAR'), "PRODUCT_ID",rng])['Sold'].sum()
print (df)
YEAR PRODUCT_ID DATE_LOCATION
2016 0E4234 1-3 9
4-7 3
8-15 3
0G2342 1-3 3
4-7 1
8-15 7
Name: Sold, dtype: int64
这篇关于在python表中添加特定日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!