带有日期时间列的子集 pandas 数据框 [英] Subset pandas data frame with datetime columns
问题描述
跟进此问题如果熊猫数据帧是使用idx.min
由一个字符串变量和一个datetime变量子集组成的,那么我们又如何由两个datetime变量子集呢?对于下面的示例数据框,我们如何对class == C
中的行以及minimum base_date
和maximum date_2
日期进行子集化? [答案将是第3行]:
Following up this question where a pandas data frame is subset by one string variable and one datetime variable using idx.min
, how could we subset by two date time variables? For the example data frame below, how would we subset rows from class == C
, with the minimum base_date
and the maximum date_2
date? [answer would be row 3]:
print(example)
slot_id class day base_date date_2
0 1 A Monday 2019-01-21 2019-01-24
1 2 B Tuesday 2019-01-22 2019-01-23
2 3 C Wednesday 2019-01-22 2019-01-24
3 4 C Wednesday 2019-01-22 2019-01-26
4 5 C Wednesday 2019-01-24 2019-01-25
5 6 C Thursday 2019-01-24 2019-01-22
6 7 D Tuesday 2019-01-23 2019-01-24
7 8 E Thursday 2019-01-24 2019-01-30
8 9 F Saturday 2019-01-26 2019-01-31
对于class == "C"
和minimum base_date
,我们可以使用:
For just class == "C"
with the minimum base_date
we can use:
df.iloc[pd.to_datetime(df.loc[df['class'] == 'C', 'base_date']).idxmin()]
但是,如果我们有2个或多个日期变量(例如max/min),那么索引解决方案仍然可行吗?索引子集是否包含2个或更多变量不暗示嵌套df.iloc
?这是用2个或多个datetime变量处理子集的唯一方法吗?
However, if we had 2 or more date variables with conditions like max/min, would the index solution still be practical? Doesn't index subsetting with 2 or more variable imply nesting df.iloc
? Is this the only way to do the subset with 2 or more datetime variables?
数据:
print(example.to_dict())
{'slot_id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9}, 'class': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'C', 5: 'C', 6: 'D', 7: 'E', 8: 'F'}, 'day': {0: 'Monday', 1: 'Tuesday', 2: 'Wednesday', 3: 'Wednesday', 4: 'Wednesday', 5: 'Thursday', 6: 'Tuesday', 7: 'Thursday', 8: 'Saturday'}, 'base_date': {0: datetime.date(2019, 1, 21), 1: datetime.date(2019, 1, 22), 2: datetime.date(2019, 1, 22), 3: datetime.date(2019, 1, 22), 4: datetime.date(2019, 1, 24), 5: datetime.date(2019, 1, 24), 6: datetime.date(2019, 1, 23), 7: datetime.date(2019, 1, 24), 8: datetime.date(2019, 1, 26)}, 'date_2': {0: datetime.date(2019, 1, 24), 1: datetime.date(2019, 1, 23), 2: datetime.date(2019, 1, 24), 3: datetime.date(2019, 1, 26), 4: datetime.date(2019, 1, 25), 5: datetime.date(2019, 1, 22), 6: datetime.date(2019, 1, 24), 7: datetime.date(2019, 1, 30), 8: datetime.date(2019, 1, 31)}}
数据预处理:
example = pd.DataFrame(example)
example['base_date'] = pd.to_datetime(example['base_date'].astype(str), format='%d%m%Y')
example['base_date'] = example['base_date'].dt.date
example['date_2'] = pd.to_datetime(example['date_2'].astype(str), format='%d%m%Y')
example['date_2'] = example['date_2'].dt.date
推荐答案
您可以使用transform
yourdf=example[example['base_date']==example.groupby('class')['base_date'].transform('min')]
如果仅用于C列
yourdf.loc[yourdf['class']=='C',:]
idxmin
或idxmax
还将仅返回满足min或max条件的第一个索引,因此,当存在多个max或min值时,它们仍仅显示一个索引
Also idxmin
or idxmax
will only return the first index met the min or max condition , so when there is multiple max or min values , they are still only show one index
这篇关于带有日期时间列的子集 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!