Python:将开始日期和结束日期分为开始日期和结束日期之间的所有天数 [英] Python: Split Start and End Date into All Days Between Start and End Date

查看:119
本文介绍了Python:将开始日期和结束日期分为开始日期和结束日期之间的所有天数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有名为计划休假"的数据,其中包括开始日期",结束日期",用户ID"和休假类型".

我希望能够创建一个新的数据框,该数据框按照用户ID"显示从开始日期到结束日期之间的所有日期.

到目前为止,我只能创建一个date_list,它提供开始日期和结束日期之间的日期范围,但是我找不到一种为每个用户ID"和离开类型"包括此日期的方法./p>

这是我当前的功能:

  def dateplit(数据):x = pd.DataFrame(columns = ['Date'])对于我在plannedleave.iterrows()中:开始=数据['开始日期'] [i]结束=数据['结束日期'] [i]date_list = [start + dt.timedelta(days = x)for x in range((end-start).days)]x.append(date_list)返回x>>>日期分割(plannedleave)>>>值错误:只能具有多元索引的元组索引 

数据如下:

 >>>plannedleave.dtypes>>>员工编号int64名对象姓氏对象离开类型对象开始日期datetime64 [ns]结束日期datetime64 [ns]dtype:对象 

如果您能在这里找到解决方案,我将不胜感激!:-)

解决方案

在我看来,仅 Date 列是不够的.您的输出DataFrame还应至少包含 Employee ID ,以了解哪个人在在给定的日期离开.

要执行任务,请定义以下功能:

  def dateplit(数据):零件= []对于idx,为data.iterrows()中的行:parts.append(pd.DataFrame(row ['Employee ID'],columns = ['Employee ID'],index = pd.date_range(start = row ['Start Date'],end = row ['End Date'],name ='Date')))返回pd.concat(parts).reset_index() 

此功能:

  • 对于每个源行,现在收集部分DataFrames":
    • 唯一的列是 Employee ID
    • 索引是开始日期和结束日期之间的日期范围,
    • 给定的雇员ID (一个单个值)实际上是广播所有行(当前雇员请假的每一天).
  • 循环后,将它们连接起来并转换索引( Date )进入常规"列.

然后称呼它:

  result = dateplit(plannedleave) 

为了测试我的代码,我将其用作源DataFrame( plannedleave ):

 员工ID名字姓氏休假类型开始日期结束日期0 1001约翰·布朗Xxxx 2020-05-10 2020-05-151100贝蒂·史密斯(Betty Smith)yyyy 2020-05-18 2020-05-22 

对于以上数据,结果为:

 日期员工ID0 2020-05-10 10011 2020-05-11 10012 2020-05-12 10013 2020-05-13 10014 2020-05-14 10015 2020-05-15 10016 2020-05-18 10027 2020-05-19 10022020年5月8日10029 2020-05-21 100210 2020-05-22 1002 

I've got data called 'Planned Leave' which includes 'Start Date', 'End Date', 'User ID' and 'Leave Type'.

I want to be able to create a new data-frame which shows all days between Start and End Date, per 'User ID'.

So far, I've only been able to create a date_list which supplies a range of dates between start and end date, but I cannot find a way to include this for each 'User ID' and 'Leave Type'.

Here is my current function:

def datesplit(data):
    x = pd.DataFrame(columns=['Date'])
    for i in plannedleave.iterrows():
        start = data['Start Date'][i]
        end = data['End Date'][i]
        date_list = [start + dt.timedelta(days=x) for x in range((end-start).days)]
    x.append(date_list)
    return x

>>> datesplit(plannedleave)
>>> Value Error: Can only Tuple-index with a MultiIndex

Here's what the data looks like:

>>> plannedleave.dtypes
>>>
    Employee ID                      int64
    First Name                      object
    Last Name                       object
    Leave Type                      object
    Start Date              datetime64[ns]
    End Date                datetime64[ns]
dtype: object

I'd be forever grateful if you could find a solution here! :-)

解决方案

In my opinion, Date column alone is not enough. Your output DataFrame should also contain at least Employee ID, to know which person is on leave at the given date.

To do your task, define the following function:

def datesplit(data):
    parts = []
    for idx, row in data.iterrows():
        parts.append(pd.DataFrame(row['Employee ID'], columns=['Employee ID'],
            index=pd.date_range(start=row['Start Date'], end=row['End Date'],
                name='Date')))
    return pd.concat(parts).reset_index()

This function:

  • For each source row collects "partial DataFrames", for now:
    • the only column is the Employee ID,
    • the index is the date range between start and end date,
    • the given Employee ID (a single value) is actually broadcast for all rows (each day the current employee in on leave).
  • After the loop, concatenates them and converts the index (Date) into a "regular" column.

Then call it:

result = datesplit(plannedleave)

To test my code, I used as a source DataFrame (plannedleave):

   Employee ID First Name Last Name Leave Type Start Date   End Date
0         1001       John     Brown       Xxxx 2020-05-10 2020-05-15
1         1002      Betty     Smith       Yyyy 2020-05-18 2020-05-22

The result, for the above data, is:

         Date  Employee ID
0  2020-05-10         1001
1  2020-05-11         1001
2  2020-05-12         1001
3  2020-05-13         1001
4  2020-05-14         1001
5  2020-05-15         1001
6  2020-05-18         1002
7  2020-05-19         1002
8  2020-05-20         1002
9  2020-05-21         1002
10 2020-05-22         1002

这篇关于Python:将开始日期和结束日期分为开始日期和结束日期之间的所有天数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆