Python 数据框交互 [英] Python dataframe interaction
问题描述
我有以下数据框:
topic student level week
1 a 1 1
1 b 2 1
1 a 3 1
2 a 1 2
2 b 2 2
2 a 3 2
2 b 4 2
新的数据框应该代表学生之间通过主题的互动.它应该包含四列:student source"、student destination"、week"和reply count".
The new dataframe should represent an interaction between students through the topic. It should contain four columns: "student source", "student destination", "week" and "reply count".
Student Destination 是每个学生与之分享主题的学生.
Student Destination is a student that each student shared the topic with.
回复计数是Student Destination直接"回复Student Source的次数.
新数据框应如下所示:
st_source st_dest week reply_count
a b 1 1
a b 2 2
b a 1 1
b a 2 1
回复计数可以通过一个例子更容易地解释.
Reply count can be explained easier with an example.
如果一个线程是由学生 A 发起的(通过在级别 1 发送消息),B 回复 A(在级别 2 发送消息),C 回复 B(在级别 3 发送消息).然后B直接"回复A,C直接"回复AB,但 C 对 A 的回复不是直接的(所以我们不计算它).
If a thread is started by student A (by sending a message at level 1), B replied to A (sending a message at level 2), C replied to B (sending a message at level 3). Then B "directly" replied to A, and C "directly" replied to B, but C's reply to A is not direct (and so we don't count it).
有人知道吗?
先谢谢你!
推荐答案
result = (df.groupby('week').apply(
lambda g: g.groupby([g.student.shift(), g.student])
.week.agg({'reply_count': 'count'})
.rename_axis(("st_source", "st_dest"))
).reset_index())
result[['st_source', 'st_dest', 'week', 'reply_count']].sort_values(['st_source', 'st_dest'])
# st_source st_dest week reply_count
#0 a b 1 1
#2 a b 2 2
#1 b a 1 1
#3 b a 2 1
这篇关于Python 数据框交互的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!