带有日期时间列的子集 pandas 数据框 [英] Subset pandas data frame with datetime columns

查看：72 发布时间：2020/5/24 3:27:14 python pandas date dataframe subset

本文介绍了带有日期时间列的子集 pandas 数据框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

跟进此问题如果熊猫数据帧是使用idx.min由一个字符串变量和一个datetime变量子集组成的，那么我们又如何由两个datetime变量子集呢?对于下面的示例数据框，我们如何对class == C中的行以及minimum base_date和maximum date_2日期进行子集化? [答案将是第3行]:

Following up this question where a pandas data frame is subset by one string variable and one datetime variable using idx.min, how could we subset by two date time variables? For the example data frame below, how would we subset rows from class == C, with the minimum base_date and the maximum date_2 date? [answer would be row 3]:

print(example)
   slot_id class        day   base_date      date_2
0        1     A     Monday  2019-01-21  2019-01-24
1        2     B    Tuesday  2019-01-22  2019-01-23
2        3     C  Wednesday  2019-01-22  2019-01-24
3        4     C  Wednesday  2019-01-22  2019-01-26
4        5     C  Wednesday  2019-01-24  2019-01-25
5        6     C   Thursday  2019-01-24  2019-01-22
6        7     D    Tuesday  2019-01-23  2019-01-24
7        8     E   Thursday  2019-01-24  2019-01-30
8        9     F   Saturday  2019-01-26  2019-01-31

对于class == "C"和minimum base_date，我们可以使用:

For just class == "C" with the minimum base_date we can use:

df.iloc[pd.to_datetime(df.loc[df['class'] == 'C', 'base_date']).idxmin()]

但是，如果我们有2个或多个日期变量(例如max/min)，那么索引解决方案仍然可行吗?索引子集是否包含2个或更多变量不暗示嵌套df.iloc?这是用2个或多个datetime变量处理子集的唯一方法吗?

However, if we had 2 or more date variables with conditions like max/min, would the index solution still be practical? Doesn't index subsetting with 2 or more variable imply nesting df.iloc? Is this the only way to do the subset with 2 or more datetime variables?

数据:

print(example.to_dict())
{'slot_id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9}, 'class': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'C', 5: 'C', 6: 'D', 7: 'E', 8: 'F'}, 'day': {0: 'Monday', 1: 'Tuesday', 2: 'Wednesday', 3: 'Wednesday', 4: 'Wednesday', 5: 'Thursday', 6: 'Tuesday', 7: 'Thursday', 8: 'Saturday'}, 'base_date': {0: datetime.date(2019, 1, 21), 1: datetime.date(2019, 1, 22), 2: datetime.date(2019, 1, 22), 3: datetime.date(2019, 1, 22), 4: datetime.date(2019, 1, 24), 5: datetime.date(2019, 1, 24), 6: datetime.date(2019, 1, 23), 7: datetime.date(2019, 1, 24), 8: datetime.date(2019, 1, 26)}, 'date_2': {0: datetime.date(2019, 1, 24), 1: datetime.date(2019, 1, 23), 2: datetime.date(2019, 1, 24), 3: datetime.date(2019, 1, 26), 4: datetime.date(2019, 1, 25), 5: datetime.date(2019, 1, 22), 6: datetime.date(2019, 1, 24), 7: datetime.date(2019, 1, 30), 8: datetime.date(2019, 1, 31)}}

数据预处理:

example = pd.DataFrame(example)
example['base_date'] = pd.to_datetime(example['base_date'].astype(str), format='%d%m%Y')
example['base_date'] = example['base_date'].dt.date
example['date_2'] = pd.to_datetime(example['date_2'].astype(str), format='%d%m%Y')
example['date_2'] = example['date_2'].dt.date

带有日期时间列的子集 pandas 数据框 [英] Subset pandas data frame with datetime columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

带有日期时间列的子集 pandas 数据框 [英] Subset pandas data frame with datetime columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭