Groupby +有条件的从另一列创建新的 [英] Groupby + conditional from another column to create new one

查看:81
本文介绍了Groupby +有条件的从另一列创建新的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在新列( 2nd_visit_date)中捕获用户的 visit_num == 2的日期

I am trying to capture the date of the "visit_num==2" of "users" in a new column ("2nd_visit_date")

下面是代码(包括我要创建的新列)

Here's the code (including the new column I want to create)

df=pd.DataFrame({'user':[1,1,2,2,2,3,3,3,3,3,4,4],
                  'date':['1995-09-01','1995-09-02','1995-10-03','1995-10-04','1995-10-05','1995-11-07','1995-11-08','1995-11-09','1995-11-10','1995-11-15','1995-12-18','1995-12-20'],
                  'visit_num':[1,2,1,2,3,1,2,3,4,5,1,2],
                  '2nd_visit_date':['1995-09-02','1995-09-02','1995-10-04','1995-10-04','1995-10-04','1995-11-08','1995-11-08','1995-11-08','1995-11-08','1995-11-08','1995-12-20','1995-12-20']})

所以我得到:

user    date    visit_num   2nd_visit_date
 1   1995-09-01     1        1995-09-02
 1   1995-09-02     2        1995-09-02
 2   1995-10-03     1        1995-10-04
 2   1995-10-04     2        1995-10-04
 2   1995-10-05     3        1995-10-04
 3   1995-11-07     1        1995-11-08
 3   1995-11-08     2        1995-11-08
 3   1995-11-09     3        1995-11-08
 3   1995-11-10     4        1995-11-08
 3   1995-11-15     5        1995-11-08
 4   1995-12-18     1        1995-12-20
 4   1995-12-20     2        1995-12-20

我尝试了以下代码,但没有用:

I tried the following code, but it did not work:

df["2nd_visit_date"] = df.groupby("user")["date"].transform(df['visit_num']==2)

任何帮助将不胜感激。谢谢。

Any help will be very much appreciated. Thanks.

推荐答案

让我们说这是您原始的 df

Let say this is your original df:

df

   user    date    visit_num
0   1   1995-09-01  1
1   1   1995-09-02  2
2   2   1995-10-03  1
3   2   1995-10-04  2
4   2   1995-10-05  3
5   3   1995-11-07  1
6   3   1995-11-08  2
7   3   1995-11-09  3
8   3   1995-11-10  4
9   3   1995-11-15  5
10  4   1995-12-18  1
11  4   1995-12-20  2

您可以先为第二次访问创建一个数据框(并更改列名):

You can first create a dataframe for second visits (and change column name):

df_2 = df[df.visit_num==2][['user', 'date']]
df_2.columns = ['user', '2nd_visit_date']
df_2


   user 2nd_visit_date
1   1   1995-09-02
3   2   1995-10-04
6   3   1995-11-08
11  4   1995-12-20

并将其与原始 df 合并p>

And merge it with your original df

pd.merge(df, df_2, on='user', how='left')

    user    date    visit_num   2nd_visit_date
0   1   1995-09-01      1         1995-09-02
1   1   1995-09-02      2         1995-09-02
2   2   1995-10-03      1         1995-10-04
3   2   1995-10-04      2         1995-10-04
4   2   1995-10-05      3         1995-10-04
5   3   1995-11-07      1         1995-11-08
6   3   1995-11-08      2         1995-11-08
7   3   1995-11-09      3         1995-11-08
8   3   1995-11-10      4         1995-11-08
9   3   1995-11-15      5         1995-11-08
10  4   1995-12-18      1         1995-12-20
11  4   1995-12-20      2         1995-12-20

这篇关于Groupby +有条件的从另一列创建新的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆