根据条件使用 pandas 排除特定日期 [英] Exclude a specific date based on a condition using pandas
问题描述
df2 = pd.DataFrame({'person_id':[11,11,11,11,11,12,12,13,13,14,14,14,14],
'admit_date':['01/01/2011','01/01/2009','12/31/2013','12/31/2017','04/03/2014','08/04/2016',
'03/05/2014','02/07/2011','08/08/2016','12/31/2017','05/01/2011','05/21/2014','07/12/2016']})
df2 = df2.melt('person_id', value_name='dates')
df2['dates'] = pd.to_datetime(df2['dates'])
我想做的是
a)如果主题具有 12月31日
和 1月1日
在其记录中。请注意,年
无关紧要。
a) Exclude/filter out records from the data frame if a subject has Dec 31st
and Jan 1st
in its records. Please note that year
doesn't matter.
如果受试者的任一 12月31日
或 1月1日
,我们将其保留不变。
If a subject has either Dec 31st
or Jan 1st
, we leave them as is.
但是如果它们都 12月31日
和 1月1日
,我们将其中的一个(12月31日或1月1日)删除。请注意,他们也可能有多个具有相同日期的条目。像 person_id = 11
But if they have both Dec 31st
and Jan 1st
, we remove one (either Dec 31st or Jan 1st) of them. note they could have multiple entries with the same date as well. Like person_id = 11
我只能做下面的事情
df2_new = df2['dates'] != '2017-12-31' #but this excludes if a subject has only `Dec 31st on 2017`. How can I ignore the dates and not consider `year`
df2[df2_new]
我的预期输出是如下所示
My expected output is like as shown below
对于person_id = 11,我们删除 12-31
,因为它们在它们的同时具有 12-31
和 01-01
记录,而对于person_id = 14,我们不会删除 12-31
,因为它的记录中只有 12-31
记录。
For person_id = 11, we drop 12-31
because it had both 12-31
and 01-01
in their records whereas for person_id = 14, we don't drop 12-31
because it has only 12-31
in its records.
仅当都 12-31 $ c时,我们才会删除
出现在一个人的记录中。 12-31
$ c>和 01-01
We drop 12-31
only when both 12-31
and 01-01
appear in a person's records.
推荐答案
使用:
s = df2['dates'].dt.strftime('%m-%d')
m1 = s.eq('01-01').groupby(df2['person_id']).transform('any')
m2 = s.eq('12-31').groupby(df2['person_id']).transform('any')
m3 = np.select([m1 & m2, m1 | m2], [s.ne('12-31'), True], default=True)
df3 = df2[m3]
结果:
# print(df3)
person_id variable dates
0 11 admit_date 2011-01-01
1 11 admit_date 2009-01-01
4 11 admit_date 2014-04-03
5 12 admit_date 2016-08-04
6 12 admit_date 2014-03-05
7 13 admit_date 2011-02-07
8 13 admit_date 2016-08-08
9 14 admit_date 2017-12-31
10 14 admit_date 2011-05-01
11 14 admit_date 2014-05-21
12 14 admit_date 2016-07-12
这篇关于根据条件使用 pandas 排除特定日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!