Python:将开始日期和结束日期分为开始日期和结束日期之间的所有天数 [英] Python: Split Start and End Date into All Days Between Start and End Date
问题描述
我有名为计划休假"的数据,其中包括开始日期",结束日期",用户ID"和休假类型".
我希望能够创建一个新的数据框,该数据框按照用户ID"显示从开始日期到结束日期之间的所有日期.
到目前为止,我只能创建一个date_list,它提供开始日期和结束日期之间的日期范围,但是我找不到一种为每个用户ID"和离开类型"包括此日期的方法./p>
这是我当前的功能:
def dateplit(数据):x = pd.DataFrame(columns = ['Date'])对于我在plannedleave.iterrows()中:开始=数据['开始日期'] [i]结束=数据['结束日期'] [i]date_list = [start + dt.timedelta(days = x)for x in range((end-start).days)]x.append(date_list)返回x>>>日期分割(plannedleave)>>>值错误:只能具有多元索引的元组索引
数据如下:
>>>plannedleave.dtypes>>>员工编号int64名对象姓氏对象离开类型对象开始日期datetime64 [ns]结束日期datetime64 [ns]dtype:对象
如果您能在这里找到解决方案,我将不胜感激!:-)
在我看来,仅 Date 列是不够的.您的输出DataFrame还应至少包含 Employee ID ,以了解哪个人在在给定的日期离开.
要执行任务,请定义以下功能:
def dateplit(数据):零件= []对于idx,为data.iterrows()中的行:parts.append(pd.DataFrame(row ['Employee ID'],columns = ['Employee ID'],index = pd.date_range(start = row ['Start Date'],end = row ['End Date'],name ='Date')))返回pd.concat(parts).reset_index()
此功能:
- 对于每个源行,现在收集部分DataFrames":
- 唯一的列是 Employee ID ,
- 索引是开始日期和结束日期之间的日期范围,
- 给定的雇员ID (一个单个值)实际上是广播所有行(当前雇员请假的每一天).
- 循环后,将它们连接起来并转换索引( Date )进入常规"列.
然后称呼它:
result = dateplit(plannedleave)
为了测试我的代码,我将其用作源DataFrame( plannedleave ):
员工ID名字姓氏休假类型开始日期结束日期0 1001约翰·布朗Xxxx 2020-05-10 2020-05-151100贝蒂·史密斯(Betty Smith)yyyy 2020-05-18 2020-05-22
对于以上数据,结果为:
日期员工ID0 2020-05-10 10011 2020-05-11 10012 2020-05-12 10013 2020-05-13 10014 2020-05-14 10015 2020-05-15 10016 2020-05-18 10027 2020-05-19 10022020年5月8日10029 2020-05-21 100210 2020-05-22 1002
I've got data called 'Planned Leave' which includes 'Start Date', 'End Date', 'User ID' and 'Leave Type'.
I want to be able to create a new data-frame which shows all days between Start and End Date, per 'User ID'.
So far, I've only been able to create a date_list which supplies a range of dates between start and end date, but I cannot find a way to include this for each 'User ID' and 'Leave Type'.
Here is my current function:
def datesplit(data):
x = pd.DataFrame(columns=['Date'])
for i in plannedleave.iterrows():
start = data['Start Date'][i]
end = data['End Date'][i]
date_list = [start + dt.timedelta(days=x) for x in range((end-start).days)]
x.append(date_list)
return x
>>> datesplit(plannedleave)
>>> Value Error: Can only Tuple-index with a MultiIndex
Here's what the data looks like:
>>> plannedleave.dtypes
>>>
Employee ID int64
First Name object
Last Name object
Leave Type object
Start Date datetime64[ns]
End Date datetime64[ns]
dtype: object
I'd be forever grateful if you could find a solution here! :-)
In my opinion, Date column alone is not enough. Your output DataFrame should also contain at least Employee ID, to know which person is on leave at the given date.
To do your task, define the following function:
def datesplit(data):
parts = []
for idx, row in data.iterrows():
parts.append(pd.DataFrame(row['Employee ID'], columns=['Employee ID'],
index=pd.date_range(start=row['Start Date'], end=row['End Date'],
name='Date')))
return pd.concat(parts).reset_index()
This function:
- For each source row collects "partial DataFrames", for now:
- the only column is the Employee ID,
- the index is the date range between start and end date,
- the given Employee ID (a single value) is actually broadcast for all rows (each day the current employee in on leave).
- After the loop, concatenates them and converts the index (Date) into a "regular" column.
Then call it:
result = datesplit(plannedleave)
To test my code, I used as a source DataFrame (plannedleave):
Employee ID First Name Last Name Leave Type Start Date End Date
0 1001 John Brown Xxxx 2020-05-10 2020-05-15
1 1002 Betty Smith Yyyy 2020-05-18 2020-05-22
The result, for the above data, is:
Date Employee ID
0 2020-05-10 1001
1 2020-05-11 1001
2 2020-05-12 1001
3 2020-05-13 1001
4 2020-05-14 1001
5 2020-05-15 1001
6 2020-05-18 1002
7 2020-05-19 1002
8 2020-05-20 1002
9 2020-05-21 1002
10 2020-05-22 1002
这篇关于Python:将开始日期和结束日期分为开始日期和结束日期之间的所有天数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!