在大 pandas 的日期级别基于groupby date time列创建一个新列 [英] create a new column based on groupby date time column at date level in pandas

查看:62
本文介绍了在大 pandas 的日期级别基于groupby date time列创建一个新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有如下所示的数据框.

I have data frame as shown below.

Doctor       Appointment           Booking_ID   
  A          2020-01-18 12:00:00     1 
  A          2020-01-18 12:30:00     2
  A          2020-01-18 13:00:00     3 
  A          2020-01-18 13:00:00     4
  A          2020-01-19 13:00:00     13
  A          2020-01-19 13:30:00     14 
  B          2020-01-18 12:00:00     5 
  B          2020-01-18 12:30:00     6 
  B          2020-01-18 13:00:00     7
  B          2020-01-25 12:30:00     6 
  B          2020-01-25 13:00:00     7
  C          2020-01-19 12:00:00     19 
  C          2020-01-19 12:30:00     20
  C          2020-01-19 13:00:00     21
  C          2020-01-22 12:30:00     20
  C          2020-01-22 13:00:00     21

从上面我想创建一个名为Session的列,如下所示.

From the above I would like to create a column called Session as shown below.

预期输出:

Doctor       Appointment           Booking_ID   Session
  A          2020-01-18 12:00:00     1          S1
  A          2020-01-18 12:30:00     2          S1
  A          2020-01-18 13:00:00     3          S1
  A          2020-01-18 13:00:00     4          S1
  A          2020-01-29 13:00:00     13         S2
  A          2020-01-29 13:30:00     14         S2
  B          2020-01-18 12:00:00     5          S3
  B          2020-01-18 12:30:00     6          S3
  B          2020-01-18 13:00:00     17         S3
  B          2020-01-25 12:30:00     16         S4
  B          2020-01-25 13:00:00     7          S4
  C          2020-01-19 12:00:00     19         S5
  C          2020-01-19 12:30:00     20         S5
  C          2020-01-19 13:00:00     21         S5
  C          2020-01-22 12:30:00     29         S6
  C          2020-01-22 13:00:00     26         S6
  C          2020-01-22 13:30:00     24         S6

对于不同的医生和不同的约会日期(以天为单位),会话应该有所不同

Session should be different for different doctor and different Appointment date(in day level)

我在下面尝试过

df = df.sort_values(['Doctor', 'Appointment'], ascending=True)


df['Appointment'] = pd.to_datetime(df['Appointment'])
dates = df['Appointment'].dt.date

df['Session'] = 'S' + pd.Series(dates.factorize()[0] + 1, index=df.index).astype(str)

但是它正在考虑仅基于日期的会话.我也想考虑医生.

But it is considering session based on only dates. I would like to consider doctor as well.

推荐答案

您可以使用sort_values并检查日期中的diff不是0还是医生与使用shift的上一行不同.像:

you can go with sort_values and check where either the diff in date is not 0 or the doctor not the same than previous row with shift like:

df = df.sort_values(['Doctor', 'Appointment'], ascending=True)
df['Session'] = 'S'+(df['Appointment'].dt.date.diff().ne(pd.Timedelta(days=0))
                     |df['Doctor'].ne(df['Doctor'].shift())).cumsum().astype(str)
print (df)
   Doctor         Appointment  Booking_ID Session
0       A 2020-01-18 12:00:00           1      S1
1       A 2020-01-18 12:30:00           2      S1
2       A 2020-01-18 13:00:00           3      S1
3       A 2020-01-18 13:00:00           4      S1
4       A 2020-01-19 13:00:00          13      S2
5       A 2020-01-19 13:30:00          14      S2
6       B 2020-01-18 12:00:00           5      S3
7       B 2020-01-18 12:30:00           6      S3
8       B 2020-01-18 13:00:00           7      S3
9       B 2020-01-25 12:30:00           6      S4
10      B 2020-01-25 13:00:00           7      S4
11      C 2020-01-19 12:00:00          19      S5
12      C 2020-01-19 12:30:00          20      S5
13      C 2020-01-19 13:00:00          21      S5
14      C 2020-01-22 12:30:00          20      S6
15      C 2020-01-22 13:00:00          21      S6

这篇关于在大 pandas 的日期级别基于groupby date time列创建一个新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆