如果日期在2个日期之间,则Python Pandas列中的总和值 [英] Python Pandas Sum Values in Columns If date between 2 dates
问题描述
我有一个数据框df
,可以使用以下数据框创建该数据框:
I have a dataframe df
which can be created with this:
data={'id':[1,1,1,1,2,2,2,2],
'date1':[datetime.date(2016,1,1),datetime.date(2016,1,2),datetime.date(2016,1,3),datetime.date(2016,1,4),
datetime.date(2016,1,2),datetime.date(2016,1,4),datetime.date(2016,1,3),datetime.date(2016,1,1)],
'date2':[datetime.date(2016,1,5),datetime.date(2016,1,3),datetime.date(2016,1,5),datetime.date(2016,1,5),
datetime.date(2016,1,4),datetime.date(2016,1,5),datetime.date(2016,1,4),datetime.date(2016,1,1)],
'score1':[5,7,3,2,9,3,8,3],
'score2':[1,3,0,5,2,20,7,7]}
df=pd.DataFrame.from_dict(data)
And looks like this:
id date1 date2 score1 score2
0 1 2016-01-01 2016-01-05 5 1
1 1 2016-01-02 2016-01-03 7 3
2 1 2016-01-03 2016-01-05 3 0
3 1 2016-01-04 2016-01-05 2 5
4 2 2016-01-02 2016-01-04 9 2
5 2 2016-01-04 2016-01-05 3 20
6 2 2016-01-03 2016-01-04 8 7
7 2 2016-01-01 2016-01-01 3 7
我需要做的是为score1
和score2
的每一个创建一个列,这将创建两个列,分别基于usedate
是否介于之间来求和score1
和score2
的值date1
和date2
.通过获取所有介于date1
最小值和date2
最大值之间的日期来创建usedate
.我用它来创建日期范围:
What I need to do is create a column for each of score1
and score2
, which creates two columns which SUM the values of score1
and score2
respectively, based on whether the usedate
is between date1
and date2
. usedate
is created by getting all dates between and including the date1
minimum and the date2
maximum. I used this to create the date range:
drange=pd.date_range(df.date1.min(),df.date2.max())
生成的数据框newdf
应该如下所示:
The resulting dataframe newdf
should look like:
usedate score1sum score2sum
0 2016-01-01 8 8
1 2016-01-02 21 6
2 2016-01-03 32 13
3 2016-01-04 30 35
4 2016-01-05 13 26
为澄清起见,在usedate
2016-01-01上,score1sum
为8,这是通过查看df
中的行计算得出的,其中2016-01-01在date1
和usedate
2016-01-04上,score2sum
为35,这是通过查看df
中的行计算得出的,其中2016-01-04在date1
和date2
之间并包括date1
和date2
,这些行将row0( 1),第3(0),第4(5),第5(2),第6(20),第7(7).
For clarification, on usedate
2016-01-01, score1sum
is 8, which is calculated by looking at the rows in df
where 2016-01-01 is between and including date1
and date2
, which sum row0(5) and row8(3). On usedate
2016-01-04, score2sum
is 35, which is calculated by looking at the rows in df
where 2016-01-04 is between and including date1
and date2
, which sum row0(1), row3(0), row4(5), row5(2), row6(20), row7(7).
也许是某种groupby
,或者是melt
然后是groupby
?
Maybe some kind of groupby
, or melt
then groupby
?
推荐答案
您可以将apply
与lambda函数一起使用:
You can use apply
with lambda function:
df['date1'] = pd.to_datetime(df['date1'])
df['date2'] = pd.to_datetime(df['date2'])
df1 = pd.DataFrame(index=pd.date_range(df.date1.min(), df.date2.max()), columns = ['score1sum', 'score2sum'])
df1[['score1sum','score2sum']] = df1.apply(lambda x: df.loc[(df.date1 <= x.name) &
(x.name <= df.date2),
['score1','score2']].sum(), axis=1)
df1.rename_axis('usedate').reset_index()
输出:
usedate score1sum score2sum
0 2016-01-01 8 8
1 2016-01-02 21 6
2 2016-01-03 32 13
3 2016-01-04 30 35
4 2016-01-05 13 26
这篇关于如果日期在2个日期之间,则Python Pandas列中的总和值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!