基于DateTime功能合并2个DataFrame [英] Merging 2 DataFrames based on DateTime feature
问题描述
我有以下2个数据框 question_posts
和 nice_question
。 question_posts
每个用户都有多个条目,而 nice_question
具有唯一的 UserId
值。
I have the following 2 DataFrames question_posts
and nice_question
. question_posts
has multiple entries for every user whereas nice_question
has unique UserId
values.
问题的一部分
OwnerUserId | CreationDate | Score
981 |2009-09-28 16:11:38.533 | 50
483 | 2009-10-18 15:11:20.533 | 700
698 | 2010-09-28 16:11:35.533 | 0
10 | 2009-01-28 10:12:38.7 | 200
nice_question的一部分
UserId | Date
981 | 2009-10-17 17:38:32.59
10 | 2009-10-20 08:37:14.143
290 | 2009-10-20 18:07:51.247
699 | 2009-10-20 21:25:24.483
我想在<$中创建新功能c $ c> nice_question 数据框,称为质量因子
,该数据框应基于平均得分=总得分/总帖子数
从 question_posts
数据框获得。 平均得分
应该针对用户在 nice_question <中提供的
日期
之前发布的帖子进行计算。 / code>。
I want to create a new feature in the nice_question
dataframe called Quality Factor
which should be based on the average score = total score/total posts
obtained from the question_posts
dataframe. The average score
should be computed for posts made by user before the date
provided in nice_question
.
我尝试了以下代码,但出现错误。
I have tried the following code but I get an error.
代码
nice_question['Quality Factor'] = (question_posts.loc[question_posts['CreationDate']<nice_question['date']]).sum()
错误
Can only compare identically-labeled Series objects
预期输出
>>nice_question.head(4)
>> UserId | Date | Quality Factor
981 | 2009-10-17 17:38:32.59 | 5
10 | 2009-10-20 08:37:14.143 | 16
290 | 2009-10-20 18:07:51.247 | 66
699 | 2009-10-20 21:25:24.483 | 9
推荐答案
这是一个解决方案(只需几个步骤,清晰度):
Here's a solution (in a few steps, for clarity):
t = pd.merge(nice_question, question_post, left_on = "UserId", right_on= "OwnerUserId", how = "left")
t = t[t.CreationDate < t.Date]
avg_scores = t.groupby("UserId")[["Score"]].mean()
avg_scores = avg_scores.rename(columns = {"Score": "avg_score"})
res = pd.merge(nice_question, avg_scores, left_on="UserId", right_index=True)
对于示例数据,结果为:
For the sample data, the result is:
UserId Date avg_score
0 981 2009-10-17 17:38:32.59 50.0
1 10 2009-10-20 08:37:14.143 200.0
这篇关于基于DateTime功能合并2个DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!