如果日期在2个日期之间,则Python Pandas列中的总和值 [英] Python Pandas Sum Values in Columns If date between 2 dates

查看:93
本文介绍了如果日期在2个日期之间,则Python Pandas列中的总和值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框df,可以使用以下数据框创建该数据框:

I have a dataframe df which can be created with this:

data={'id':[1,1,1,1,2,2,2,2],
      'date1':[datetime.date(2016,1,1),datetime.date(2016,1,2),datetime.date(2016,1,3),datetime.date(2016,1,4),
               datetime.date(2016,1,2),datetime.date(2016,1,4),datetime.date(2016,1,3),datetime.date(2016,1,1)],
      'date2':[datetime.date(2016,1,5),datetime.date(2016,1,3),datetime.date(2016,1,5),datetime.date(2016,1,5),
               datetime.date(2016,1,4),datetime.date(2016,1,5),datetime.date(2016,1,4),datetime.date(2016,1,1)],
      'score1':[5,7,3,2,9,3,8,3],
      'score2':[1,3,0,5,2,20,7,7]}
df=pd.DataFrame.from_dict(data)

And looks like this:
   id       date1       date2  score1  score2
0   1  2016-01-01  2016-01-05       5       1
1   1  2016-01-02  2016-01-03       7       3
2   1  2016-01-03  2016-01-05       3       0
3   1  2016-01-04  2016-01-05       2       5
4   2  2016-01-02  2016-01-04       9       2
5   2  2016-01-04  2016-01-05       3      20
6   2  2016-01-03  2016-01-04       8       7
7   2  2016-01-01  2016-01-01       3       7

我需要做的是为score1score2的每一个创建一个列,这将创建两个列,分别基于usedate是否介于之间来求和score1score2的值date1date2.通过获取所有介于date1最小值和date2最大值之间的日期来创建usedate.我用它来创建日期范围:

What I need to do is create a column for each of score1 and score2, which creates two columns which SUM the values of score1 and score2 respectively, based on whether the usedate is between date1 and date2. usedate is created by getting all dates between and including the date1 minimum and the date2 maximum. I used this to create the date range:

drange=pd.date_range(df.date1.min(),df.date2.max())    

生成的数据框newdf应该如下所示:

The resulting dataframe newdf should look like:

     usedate  score1sum  score2sum
0 2016-01-01          8          8
1 2016-01-02         21          6
2 2016-01-03         32         13
3 2016-01-04         30         35
4 2016-01-05         13         26

为澄清起见,在usedate 2016-01-01上,score1sum为8,这是通过查看df中的行计算得出的,其中2016-01-01在date1date1和,它们将row0(5)和row8(3)相加.在usedate 2016-01-04上,score2sum为35,这是通过查看df中的行计算得出的,其中2016-01-04在date1date2之间并包括date1date2,这些行将row0( 1),第3(0),第4(5),第5(2),第6(20),第7(7).

For clarification, on usedate 2016-01-01, score1sum is 8, which is calculated by looking at the rows in df where 2016-01-01 is between and including date1 and date2, which sum row0(5) and row8(3). On usedate 2016-01-04, score2sum is 35, which is calculated by looking at the rows in df where 2016-01-04 is between and including date1 and date2, which sum row0(1), row3(0), row4(5), row5(2), row6(20), row7(7).

也许是某种groupby,或者是melt然后是groupby?

Maybe some kind of groupby, or melt then groupby?

推荐答案

您可以将apply与lambda函数一起使用:

You can use apply with lambda function:

df['date1'] = pd.to_datetime(df['date1'])

df['date2'] = pd.to_datetime(df['date2'])

df1 = pd.DataFrame(index=pd.date_range(df.date1.min(), df.date2.max()), columns = ['score1sum', 'score2sum'])

df1[['score1sum','score2sum']] = df1.apply(lambda x: df.loc[(df.date1 <= x.name) & 
                                                            (x.name <= df.date2),
                                                            ['score1','score2']].sum(), axis=1)

df1.rename_axis('usedate').reset_index()

输出:

     usedate  score1sum  score2sum
0 2016-01-01          8          8
1 2016-01-02         21          6
2 2016-01-03         32         13
3 2016-01-04         30         35
4 2016-01-05         13         26

这篇关于如果日期在2个日期之间,则Python Pandas列中的总和值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆