计算 pandas 数据框设置的范围内的日期 [英] Counting dates in a range set by pandas dataframe
问题描述
我有一个熊猫数据框,其中包含两个日期列,一个开始日期和一个定义范围的结束日期。我希望能够收集这些列所定义的数据框中所有行的所有日期的总数。
I have a pandas dataframe that contains two date columns, a start date and an end date that defines a range. I'd like to be able to collect a total count for all dates across all rows in the dataframe, as defined by these columns.
例如,表看起来像:
index start_date end date
0 '2015-01-01' '2015-01-17'
1 '2015-01-03' '2015-01-12'
结果是每个日期的总计,例如:
And the result would be a per date aggregate, like:
date count
'2015-01-01' 1
'2015-01-02' 1
'2015-01-03' 2
我当前的方法有效,但是在大数据帧上却非常慢,因为我要遍历行,计算范围然后遍历。我希望找到一种更好的方法。
My current approach works but is extremely slow on a big dataframe as I'm looping across the rows, calculating the range and then looping through this. I'm hoping to find a better approach.
当前我正在做的事情:
date = pd.date_range (min (df.start_date), max (df.end_date))
df2 = pd.DataFrame (index =date)
df2 ['count'] = 0
for index, row in df.iterrows ():
dates = pd.date_range (row ['start_date'], row ['end_date'])
for date in dates:
df2.loc['date']['count'] += 1
推荐答案
按照@Sam的建议堆叠相关列后,只需使用 value_counts
。
After stacking the relevant columns as suggested by @Sam, just use value_counts
.
df[['start_date', 'end date']].stack().value_counts()
编辑:
鉴于您还希望计算开始日期和结束日期之间的日期:
Given that you also want to count the dates between the start and end dates:
start_dates = pd.to_datetime(df.start_date)
end_dates = pd.to_datetime(df.end_date)
>>> pd.Series(dt.date() for group in
[pd.date_range(start, end) for start, end in zip(start_dates, end_dates)]
for dt in group).value_counts()
Out[178]:
2015-01-07 2
2015-01-06 2
2015-01-12 2
2015-01-05 2
2015-01-04 2
2015-01-10 2
2015-01-03 2
2015-01-09 2
2015-01-08 2
2015-01-11 2
2015-01-16 1
2015-01-17 1
2015-01-14 1
2015-01-15 1
2015-01-02 1
2015-01-01 1
2015-01-13 1
dtype: int64
这篇关于计算 pandas 数据框设置的范围内的日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!