带有日期时间列的子集 pandas 数据框 [英] Subset pandas data frame with datetime columns

查看:72
本文介绍了带有日期时间列的子集 pandas 数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

跟进此问题如果熊猫数据帧是使用idx.min由一个字符串变量和一个datetime变量子集组成的,那么我们又如何由两个datetime变量子集呢?对于下面的示例数据框,我们如何对class == C中的行以及minimum base_datemaximum date_2日期进行子集化? [答案将是第3行]:

Following up this question where a pandas data frame is subset by one string variable and one datetime variable using idx.min, how could we subset by two date time variables? For the example data frame below, how would we subset rows from class == C, with the minimum base_date and the maximum date_2 date? [answer would be row 3]:

print(example)
   slot_id class        day   base_date      date_2
0        1     A     Monday  2019-01-21  2019-01-24
1        2     B    Tuesday  2019-01-22  2019-01-23
2        3     C  Wednesday  2019-01-22  2019-01-24
3        4     C  Wednesday  2019-01-22  2019-01-26
4        5     C  Wednesday  2019-01-24  2019-01-25
5        6     C   Thursday  2019-01-24  2019-01-22
6        7     D    Tuesday  2019-01-23  2019-01-24
7        8     E   Thursday  2019-01-24  2019-01-30
8        9     F   Saturday  2019-01-26  2019-01-31

对于class == "C"minimum base_date,我们可以使用:

For just class == "C" with the minimum base_date we can use:

df.iloc[pd.to_datetime(df.loc[df['class'] == 'C', 'base_date']).idxmin()]

但是,如果我们有2个或多个日期变量(例如max/min),那么索引解决方案仍然可行吗?索引子集是否包含2个或更多变量不暗示嵌套df.iloc?这是用2个或多个datetime变量处理子集的唯一方法吗?

However, if we had 2 or more date variables with conditions like max/min, would the index solution still be practical? Doesn't index subsetting with 2 or more variable imply nesting df.iloc? Is this the only way to do the subset with 2 or more datetime variables?

数据:

print(example.to_dict())
{'slot_id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9}, 'class': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'C', 5: 'C', 6: 'D', 7: 'E', 8: 'F'}, 'day': {0: 'Monday', 1: 'Tuesday', 2: 'Wednesday', 3: 'Wednesday', 4: 'Wednesday', 5: 'Thursday', 6: 'Tuesday', 7: 'Thursday', 8: 'Saturday'}, 'base_date': {0: datetime.date(2019, 1, 21), 1: datetime.date(2019, 1, 22), 2: datetime.date(2019, 1, 22), 3: datetime.date(2019, 1, 22), 4: datetime.date(2019, 1, 24), 5: datetime.date(2019, 1, 24), 6: datetime.date(2019, 1, 23), 7: datetime.date(2019, 1, 24), 8: datetime.date(2019, 1, 26)}, 'date_2': {0: datetime.date(2019, 1, 24), 1: datetime.date(2019, 1, 23), 2: datetime.date(2019, 1, 24), 3: datetime.date(2019, 1, 26), 4: datetime.date(2019, 1, 25), 5: datetime.date(2019, 1, 22), 6: datetime.date(2019, 1, 24), 7: datetime.date(2019, 1, 30), 8: datetime.date(2019, 1, 31)}}

数据预处理:

example = pd.DataFrame(example)
example['base_date'] = pd.to_datetime(example['base_date'].astype(str), format='%d%m%Y')
example['base_date'] = example['base_date'].dt.date
example['date_2'] = pd.to_datetime(example['date_2'].astype(str), format='%d%m%Y')
example['date_2'] = example['date_2'].dt.date

推荐答案

您可以使用transform

yourdf=example[example['base_date']==example.groupby('class')['base_date'].transform('min')]

如果仅用于C列

yourdf.loc[yourdf['class']=='C',:]


idxminidxmax还将仅返回满足min或max条件的第一个索引,因此,当存在多个max或min值时,它们仍仅显示一个索引


Also idxmin or idxmax will only return the first index met the min or max condition , so when there is multiple max or min values , they are still only show one index

这篇关于带有日期时间列的子集 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆