基于DateTime功能合并2个DataFrame [英] Merging 2 DataFrames based on DateTime feature

查看:199
本文介绍了基于DateTime功能合并2个DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下2个数据框 question_posts nice_question question_posts 每个用户都有多个条目,而 nice_question 具有唯一的 UserId 值。

I have the following 2 DataFrames question_posts and nice_question. question_posts has multiple entries for every user whereas nice_question has unique UserId values.

问题的一部分

OwnerUserId | CreationDate             | Score
981         |2009-09-28 16:11:38.533   |  50
483         | 2009-10-18 15:11:20.533  | 700
698         | 2010-09-28 16:11:35.533  |   0
10          | 2009-01-28 10:12:38.7    | 200

nice_question的一部分

UserId | Date
981    | 2009-10-17 17:38:32.59
10     | 2009-10-20 08:37:14.143
290    | 2009-10-20 18:07:51.247
699    | 2009-10-20 21:25:24.483
    

我想在<$中创建新功能c $ c> nice_question 数据框,称为质量因子,该数据框应基于平均得分=总得分/总帖子数 question_posts 数据框获得。 平均得分应该针对用户在 nice_question <中提供的日期之前发布的帖子进行计算。 / code>。

I want to create a new feature in the nice_question dataframe called Quality Factor which should be based on the average score = total score/total posts obtained from the question_posts dataframe. The average score should be computed for posts made by user before the date provided in nice_question.

我尝试了以下代码,但出现错误。

I have tried the following code but I get an error.

代码

nice_question['Quality Factor'] = (question_posts.loc[question_posts['CreationDate']<nice_question['date']]).sum() 

错误

Can only compare identically-labeled Series objects

预期输出

>>nice_question.head(4)

>> UserId | Date                     | Quality Factor
    981    | 2009-10-17 17:38:32.59  | 5
    10     | 2009-10-20 08:37:14.143 | 16
    290    | 2009-10-20 18:07:51.247 | 66
    699    | 2009-10-20 21:25:24.483 | 9


推荐答案

这是一个解决方案(只需几个步骤,清晰度):

Here's a solution (in a few steps, for clarity):

t = pd.merge(nice_question, question_post, left_on = "UserId", right_on= "OwnerUserId", how = "left")
t = t[t.CreationDate < t.Date]
avg_scores = t.groupby("UserId")[["Score"]].mean()
avg_scores = avg_scores.rename(columns = {"Score": "avg_score"})
res = pd.merge(nice_question, avg_scores, left_on="UserId", right_index=True)

对于示例数据,结果为:

For the sample data, the result is:

   UserId                     Date  avg_score
0     981   2009-10-17 17:38:32.59       50.0
1      10  2009-10-20 08:37:14.143      200.0

这篇关于基于DateTime功能合并2个DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆