pandas 六个月的日期范围 [英] date range for six monthly in pandas
问题描述
所以,这是我的数据框.
So, this is my data frame.
PatientNumber QT Answer Answerdate DiagnosisDate
1 1 transferring No 2017-03-03 2018-05-03
2 1 preparing food No 2017-03-03 2018-05-03
3 1 medications Yes 2017-03-03 2018-05-03
4 2 transferring No 2011-05-10 2012-05-04
5 2 preparing food No 2011-05-10 2012-05-04
6 2 medications No 2011-05-10 2012-05-04
7 2 transferring Yes 2011-15-03 2012-05-04
8 2 preparing food Yes 2011-15-03 2012-05-04
9 2 medications No 2011-15-03 2012-05-04
10 2 transferring Yes 2010-15-12 2012-05-04
11 2 preparing food No 2010-15-12 2012-05-04
12 2 medications No 2010-15-12 2012-05-04
13 2 transferring Yes 2009-10-10 2012-05-04
14 2 preparing food No 2009-10-10 2012-05-04
15 2 medications No 2009-10-10 2012-05-04
16 3 medications No 2008-10-10 2010-07-04
我刚刚在此处找到了一个与我的问题相关的链接它没有得到正确的答案.
I just found one link related to my question here that it did not get any correct answer.
一些说明: 对于每个患者编号,diagnosticDate是唯一的.答复日期是他们填写问题单的几次.
some explanations: for each patientNumber, diagnosisDate is unique.and Answer Date is several time they have filled a questionary.
但是我想做什么:?
我的目标是从DiagnosisDate
every six month
返回,并将其标记在列as the first 6 month record
上.在该列中,我们应该保存哪六个月(前六个月,第二个,第三个...).
my goal is to go back from DiagnosisDate
every six month
, and mark that on to a column as the first 6 month record
. in the column we should save which six month is that(the first six month, the second, the third,...).
,对于PatientNumber=1
,DiagnosisDate
为2018-05-03
,因此应该从那时6 month
开始返回. the first 6 month
是2017-27-11
,因为最大的AnswerDate不在该日期之内,因此不会被标记为first six month
.
如果第一个answerdate
属于该日期,则将其标记为first 6 month
.
for example for this dataframe, DiagnosisDate
for PatientNumber=1
is 2018-05-03
so it should go back from that time 6 month
. the first 6 month
is 2017-27-11
as the biggest AnswerDate do not fall under that date, it wont be marked as first six month
.
if the first answerdate
falls under this date, it will be marked as first 6 month
.
所以在这里PatientNumber=1
在6month
列中有3
,因为当我们从diagnosisdate
6 month
返回时,answerdate
会在该时间之后降到该6 month
之下.
因此此数据帧的输出为:
so here PatientNumber=1
got 3
in the column 6month
, because when we get back from diagnosisdate
6 month
back, the answerdate
falls under that 6 month
there time later.
so the output of this dataframe will be:
PatientNumber QT Answer Answerdate DiagnosisDate 6month
1 1 transferring No 2017-03-03 2018-05-03 3
2 1 preparing food No 2017-03-03 2018-05-03 3
3 1 medications Yes 2017-03-03 2018-05-03 3
4 2 transferring No 2011-05-10 2012-05-04 1
5 2 preparing food No 2011-05-10 2012-05-04 1
6 2 medications No 2011-05-10 2012-05-04 1
7 2 transferring Yes 2011-15-04 2012-05-04 2
8 2 preparing food Yes 2011-15-04 2012-05-04 2
9 2 medications No 2011-15-04 2012-05-04 2
10 2 transferring Yes 2010-15-12 2012-05-04 3
11 2 preparing food No 2010-15-12 2012-05-04 3
12 2 medications No 2010-15-12 2012-05-04 3
13 2 transferring Yes 2009-10-10 2012-05-04 5
14 2 preparing food No 2009-10-10 2012-05-04 5
15 2 medications No 2009-10-10 2012-05-04 5
16 3 medications No 2008-10-10 2010-07-04 4
对于PatientNumber = 2,它将从DiagnosisDate =2012-05-04
开始并返回6个月.它将是2011-11-04
.
For PatientNumber =2, it will start from DiagnosisDate =2012-05-04
and go back 6 month.it will be 2011-11-04
.
我应用了这个:
data['6month'] = pd.date_range(end=data['diagnosisdate'],periods=2, freq='6M',closed='left')
首先,它只关心月份,因此,计算不完全准确, 而且我找不到办法提及6个月的数字,就像我在上述数据框中提到的那样(在6个月一栏中,我的意思是1 2,...而不是日期.
firstly it just care about month, so calculate approximately not exactly, and I could not find a way to mention the number of 6 month, like the thing I mention in the above dataframe(in the column 6 month I meantion 1 2, ... instead of the date.
因此,根据数据,我们可能会在6month
列中看到1...10
中的数字(考虑诊断前5年)
Therefore according to the data, we may see in the column 6month
the numbers from 1...10
(considering 5 years before diagnosis)
长话短说.希望有人可以花时间:).
Long story. hope someone can take time :).
我还需要将整个列保持在结果上.
Also I need to keep the whole column on the result as it is.
推荐答案
这并不是您所需要的,但是可以提供足够好的结果.我认为您可以通过计算"DiagnosisDate"和"Answerdate"列之间的时间差,然后除以pd.np.timedelta64(6, 'M')
(将频率更改为6个月)来做到.然后,您需要ceil
函数来获取上面的整数,例如:
It's not exactely what you want, but a work around giving good enough results. I think you can do by calculating the time difference between column DiagnosisDate and Answerdate, and divide by pd.np.timedelta64(6, 'M')
(to change the frequency to 6 months). Then you need the ceil
function to get the integer above, such as:
data['6month'] = (pd.np.ceil((data['DiagnosisDate']-pd.Timedelta(days=1)-data['Answerdate'])
/pd.np.timedelta64(6, 'M')).astype(int))
忽略否定列:
data = data[(data['6month'] >= 0)]
与您的样品一起,它给出:
With your sample, it gives:
PatientNumber QT Answer Answerdate DiagnosisDate 6month
1 1 transferring No 2017-03-03 2018-03-05 3
2 1 preparing No 2017-03-03 2018-03-05 3
3 1 medications Yes 2017-03-03 2018-03-05 3
4 2 transferring No 2011-10-05 2012-04-05 1
5 2 preparing No 2011-10-05 2012-04-05 1
6 2 medications No 2011-10-05 2012-04-05 1
7 2 transferring Yes 2011-03-15 2012-04-05 3
8 2 preparing Yes 2011-03-15 2012-04-05 3
9 2 medications No 2011-03-15 2012-04-05 3
10 2 transferring Yes 2010-12-15 2012-04-05 3
11 2 preparing No 2010-12-15 2012-04-05 3
12 2 medications No 2010-12-15 2012-04-05 3
13 2 transferring Yes 2009-10-10 2012-04-05 5
14 2 preparing No 2009-10-10 2012-04-05 5
15 2 medications No 2009-10-10 2012-04-05 5
16 3 medications No 2008-10-10 2010-04-07 3
此外,我不会使用pd.date_range
,因为它似乎不符合您的期望,但我可能是错的.
Also, I would not use pd.date_range
as it seems not to act like you want, but I might be wrong.
要删除DiagnosisDate在Answerdate之前的情况,一旦创建了6个月的列,只需执行data = data[data['6months'] > 0]
,因为在这种情况下该值将为负或零
to remove the case where DiagnosisDate is before Answerdate, once you have created your column 6months, just do data = data[data['6months'] > 0]
as the value would be negative or zero in this case
这篇关于 pandas 六个月的日期范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!