Python 数据框交互 [英] Python dataframe interaction

查看:50
本文介绍了Python 数据框交互的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框:

topic  student level week
1      a       1     1
1      b       2     1
1      a       3     1
2      a       1     2
2      b       2     2
2      a       3     2
2      b       4     2

新的数据框应该代表学生之间通过主题的互动.它应该包含四列:student source"、student destination"、week"和reply count".

The new dataframe should represent an interaction between students through the topic. It should contain four columns: "student source", "student destination", "week" and "reply count".

Student Destination 是每个学生与之分享主题的学生.

Student Destination is a student that each student shared the topic with.

回复计数Student Destination直接"回复Student Source的次数.

新数据框应如下所示:

st_source st_dest  week  reply_count
    a        b       1        1
    a        b       2        2
    b        a       1        1
    b        a       2        1

回复计数可以通过一个例子更容易地解释.

Reply count can be explained easier with an example.

如果一个线程是由学生 A 发起的(通过在级别 1 发送消息),B 回复 A(在级别 2 发送消息),C 回复 B(在级别 3 发送消息).然后B直接"回复A,C直接"回复AB,但 C 对 A 的回复不是直接的(所以我们不计算它).

If a thread is started by student A (by sending a message at level 1), B replied to A (sending a message at level 2), C replied to B (sending a message at level 3). Then B "directly" replied to A, and C "directly" replied to B, but C's reply to A is not direct (and so we don't count it).

有人知道吗?

先谢谢你!

推荐答案

result = (df.groupby('week').apply(
        lambda g: g.groupby([g.student.shift(), g.student])
        .week.agg({'reply_count': 'count'})
        .rename_axis(("st_source", "st_dest"))
    ).reset_index())
​
result[['st_source', 'st_dest', 'week', 'reply_count']].sort_values(['st_source', 'st_dest'])

# st_source     st_dest   week  reply_count
#0        a         b        1          1
#2        a         b        2          2
#1        b         a        1          1
#3        b         a        2          1

这篇关于Python 数据框交互的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆