pandas 六个月的日期范围 [英] date range for six monthly in pandas

查看:106
本文介绍了 pandas 六个月的日期范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,这是我的数据框.

So, this is my data frame.

PatientNumber           QT         Answer   Answerdate  DiagnosisDate 
1        1          transferring     No      2017-03-03 2018-05-03     
2        1          preparing food   No      2017-03-03 2018-05-03     
3        1          medications      Yes     2017-03-03 2018-05-03     
4        2          transferring     No      2011-05-10 2012-05-04       
5        2          preparing food   No      2011-05-10 2012-05-04     
6        2          medications      No      2011-05-10 2012-05-04     
7        2          transferring     Yes     2011-15-03  2012-05-04     
8        2          preparing food   Yes     2011-15-03  2012-05-04     
9        2          medications      No      2011-15-03  2012-05-04     
10       2          transferring     Yes     2010-15-12 2012-05-04     
11       2          preparing food   No      2010-15-12 2012-05-04     
12       2          medications      No      2010-15-12 2012-05-04     
13       2          transferring     Yes     2009-10-10 2012-05-04     
14       2          preparing food   No      2009-10-10 2012-05-04     
15       2          medications      No      2009-10-10 2012-05-04     
16       3          medications      No      2008-10-10 2010-07-04     

我刚刚在此处找到了一个与我的问题相关的链接它没有得到正确的答案.

I just found one link related to my question here that it did not get any correct answer.

一些说明: 对于每个患者编号,diagnosticDate是唯一的.答复日期是他们填写问题单的几次.

some explanations: for each patientNumber, diagnosisDate is unique.and Answer Date is several time they have filled a questionary.

但是我想做什么:?

我的目标是从DiagnosisDate every six month返回,并将其标记在列as the first 6 month record上.在该列中,我们应该保存哪六个月(前六​​个月,第二个,第三个...).

my goal is to go back from DiagnosisDate every six month, and mark that on to a column as the first 6 month record. in the column we should save which six month is that(the first six month, the second, the third,...).

,对于PatientNumber=1DiagnosisDate2018-05-03,因此应该从那时6 month开始返回. the first 6 month2017-27-11,因为最大的AnswerDate不在该日期之内,因此不会被标记为first six month. 如果第一个answerdate属于该日期,则将其标记为first 6 month.

for example for this dataframe, DiagnosisDate for PatientNumber=1 is 2018-05-03 so it should go back from that time 6 month. the first 6 month is 2017-27-11 as the biggest AnswerDate do not fall under that date, it wont be marked as first six month. if the first answerdate falls under this date, it will be marked as first 6 month.

所以在这里PatientNumber=16month列中有3,因为当我们从diagnosisdate 6 month返回时,answerdate会在该时间之后降到该6 month之下. 因此此数据帧的输出为:

so here PatientNumber=1 got 3 in the column 6month, because when we get back from diagnosisdate 6 month back, the answerdate falls under that 6 month there time later. so the output of this dataframe will be:

PatientNumber           QT         Answer   Answerdate  DiagnosisDate  6month
1        1          transferring     No      2017-03-03 2018-05-03     3
2        1          preparing food   No      2017-03-03 2018-05-03     3
3        1          medications      Yes     2017-03-03 2018-05-03     3
4        2          transferring     No      2011-05-10 2012-05-04     1 
5        2          preparing food   No      2011-05-10 2012-05-04     1
6        2          medications      No      2011-05-10 2012-05-04     1
7        2          transferring     Yes     2011-15-04  2012-05-04    2
8        2          preparing food   Yes     2011-15-04  2012-05-04    2
9        2          medications      No      2011-15-04  2012-05-04    2
10       2          transferring     Yes     2010-15-12 2012-05-04     3
11       2          preparing food   No      2010-15-12 2012-05-04     3
12       2          medications      No      2010-15-12 2012-05-04     3
13       2          transferring     Yes     2009-10-10 2012-05-04     5
14       2          preparing food   No      2009-10-10 2012-05-04     5
15       2          medications      No      2009-10-10 2012-05-04     5
16       3          medications      No      2008-10-10 2010-07-04     4

对于PatientNumber = 2,它将从DiagnosisDate =2012-05-04开始并返回6个月.它将是2011-11-04.

For PatientNumber =2, it will start from DiagnosisDate =2012-05-04 and go back 6 month.it will be 2011-11-04.

我应用了这个:

data['6month'] = pd.date_range(end=data['diagnosisdate'],periods=2, freq='6M',closed='left')

首先,它只关心月份,因此,计算不完全准确, 而且我找不到办法提及6个月的数字,就像我在上述数据框中提到的那样(在6个月一栏中,我的意思是1 2,...而不是日期.

firstly it just care about month, so calculate approximately not exactly, and I could not find a way to mention the number of 6 month, like the thing I mention in the above dataframe(in the column 6 month I meantion 1 2, ... instead of the date.

因此,根据数据,我们可能会在6month列中看到1...10中的数字(考虑诊断前5年)

Therefore according to the data, we may see in the column 6month the numbers from 1...10(considering 5 years before diagnosis)

长话短说.希望有人可以花时间:).

Long story. hope someone can take time :).

我还需要将整个列保持在结果上.

Also I need to keep the whole column on the result as it is.

推荐答案

这并不是您所需要的,但是可以提供足够好的结果.我认为您可以通过计算"DiagnosisDate"和"Answerdate"列之间的时间差,然后除以pd.np.timedelta64(6, 'M')(将频率更改为6个月)来做到.然后,您需要ceil函数来获取上面的整数,例如:

It's not exactely what you want, but a work around giving good enough results. I think you can do by calculating the time difference between column DiagnosisDate and Answerdate, and divide by pd.np.timedelta64(6, 'M') (to change the frequency to 6 months). Then you need the ceil function to get the integer above, such as:

data['6month'] = (pd.np.ceil((data['DiagnosisDate']-pd.Timedelta(days=1)-data['Answerdate'])
                                             /pd.np.timedelta64(6, 'M')).astype(int))

忽略否定列:

data = data[(data['6month'] >= 0)]

与您的样品一起,它给出:

With your sample, it gives:

    PatientNumber            QT Answer Answerdate DiagnosisDate  6month
1               1  transferring     No 2017-03-03    2018-03-05       3
2               1     preparing     No 2017-03-03    2018-03-05       3
3               1   medications    Yes 2017-03-03    2018-03-05       3
4               2  transferring     No 2011-10-05    2012-04-05       1
5               2     preparing     No 2011-10-05    2012-04-05       1
6               2   medications     No 2011-10-05    2012-04-05       1
7               2  transferring    Yes 2011-03-15    2012-04-05       3
8               2     preparing    Yes 2011-03-15    2012-04-05       3
9               2   medications     No 2011-03-15    2012-04-05       3
10              2  transferring    Yes 2010-12-15    2012-04-05       3
11              2     preparing     No 2010-12-15    2012-04-05       3
12              2   medications     No 2010-12-15    2012-04-05       3
13              2  transferring    Yes 2009-10-10    2012-04-05       5
14              2     preparing     No 2009-10-10    2012-04-05       5
15              2   medications     No 2009-10-10    2012-04-05       5
16              3   medications     No 2008-10-10    2010-04-07       3

此外,我不会使用pd.date_range,因为它似乎不符合您的期望,但我可能是错的.

Also, I would not use pd.date_range as it seems not to act like you want, but I might be wrong.

要删除DiagnosisDate在Answerdate之前的情况,一旦创建了6个月的列,只需执行data = data[data['6months'] > 0],因为在这种情况下该值将为负或零

to remove the case where DiagnosisDate is before Answerdate, once you have created your column 6months, just do data = data[data['6months'] > 0] as the value would be negative or zero in this case

这篇关于 pandas 六个月的日期范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆