如何在Python中基于日期时间获取值计数 [英] How to get count of values based on datetime in Python

查看:287
本文介绍了如何在Python中基于日期时间获取值计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了以下代码,该代码创建了两个数据框 nq cmnt

nq 包含 UserId 和相应的徽章达到时间日期

cmnt 包含 OwnerUserId 和用户发表评论的时间 CreationDate

我想获得获得徽章1周之前和之后的所有天的评论数,以便创建一个时间序列

I have written the following code which creates two dataframes nq and cmnt.
nq contains the UserId and corresponding time of Badge Attainment date.
cmnt contains OwnerUserId and the time when the User made a comment CreationDate.
I want to get a count of the comments made for all days before and after 1 week of badge attainment so that I can create a time series line plot out of it.

以下代码执行相同的操作,但会产生KeyError。请提供对所有用户执行此操作的代码。

The following code perform the same but produces a KeyError. Please provide a code that performs this operations for all users.

nq

 UserId |   date 
     1      2009-10-17 17:38:32.590
     2      2009-10-19 00:37:23.067
     3      2009-10-20 08:37:14.143
     4      2009-10-21 18:07:51.247
     5      2009-10-22 21:25:24.483

cmnt

OwnerUserId | CreationDate
1             2009-10-16 17:38:32.590
1             2009-10-18 17:38:32.590
2             2009-10-18 00:37:23.067
2             2009-10-17 00:37:23.067
2             2009-10-20 00:37:23.067
3             2009-10-19 08:37:14.143
4             2009-10-20 18:07:51.247
5             2009-10-21 21:25:24.483

代码

 nq.date = pd.to_datetime(nq.date)
 cmnt.CreationDate = pd.to_datetime(cmnt.CreationDate)

 count= []
   
 for j in range(len(nq)): 
      for i in range(-7,8):
        
          check_date = nq.date.iloc[j] + timedelta(days=i)
          
          count = cmnt.loc[(cmnt.OwnerUserId == nq.UserId.iloc[j]) & (cmnt.CreationDate == check_date)].shape[0]
          nq.iloc[j].append({nq[i]:count})

预期输出

UserId     |   date                 |-7|-6|-5|-4|-3|-2|-1|0 |1 |2 |3 |4 |5 |6 |7
     1      2009-10-17 17:38:32.590 |0 |0 |0 |0 |0 |0 |1 |0 |1 |0 |0 |0 |0 |0 |0  
     2      2009-10-19 00:37:23.067 |0 |0 |0 |0 |0 |1 |1 |0 |1 |0 |0 |0 |0 |0 |0    
     3      2009-10-20 08:37:14.143 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 
     4      2009-10-21 18:07:51.247 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 
     5      2009-10-22 21:25:24.483 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 

此处列 -1 表示在获得徽章后1天发表评论, 1 表示在获得徽章后1天发表评论,依此类推。

Here column -1 means comment made 1 day before badge attainment and 1 means comment made one day after badge attainment and so on.

注意
可以使用另一种完全替代的方法。我的主要目的是绘制一个时间序列线图,以显示获得徽章之前和之后用户发表的评论的数量。

Note There can be a completely alternately way to do this. My main objective is to draw a time series line plot which shows the number of comments made by the users before and after attainment of the badge.

推荐答案

这里是一种方法:

t = pd.merge(nq, cmnt, left_on="UserId", right_on = "OwnerUserId")
t["days_diff"] = (t["CreationDate"] - t["date"]).dt.days
t["count"] = t.groupby(["UserId", "days_diff"]).OwnerUserId.transform("count")

all_days = pd.DataFrame(itertools.product(t.UserId.unique(), range(-7, 8)), )
all_days.columns = ["UserId", "day"]

t = pd.merge(t, all_days, left_on=["UserId", "days_diff"], right_on=["UserId", "day"], how = "right")
t = pd.pivot_table(t, index="UserId", columns="day", values="count", dropna=False)

res = pd.merge(nq, t, left_on="UserId", right_index=True)

print(res)

输出为:

   UserId                    date  -7  -6  -5  -4  -3   -2   -1   0    1   2   3   4   5   6   7
0       1 2009-10-17 17:38:32.590 NaN NaN NaN NaN NaN  NaN  1.0 NaN  1.0 NaN NaN NaN NaN NaN NaN
1       2 2009-10-19 00:37:23.067 NaN NaN NaN NaN NaN  1.0  1.0 NaN  1.0 NaN NaN NaN NaN NaN NaN
2       3 2009-10-20 08:37:14.143 NaN NaN NaN NaN NaN  NaN  1.0 NaN  NaN NaN NaN NaN NaN NaN NaN
3       4 2009-10-21 18:07:51.247 NaN NaN NaN NaN NaN  NaN  1.0 NaN  NaN NaN NaN NaN NaN NaN NaN
4       5 2009-10-22 21:25:24.483 NaN NaN NaN NaN NaN  NaN  1.0 NaN  NaN NaN NaN NaN NaN NaN NaN

这篇关于如何在Python中基于日期时间获取值计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆