创建可用值的分布-Python [英] Create distribution of available values - Python

查看:44
本文介绍了创建可用值的分布-Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望创建一个分配,以显示员工可以工作的时间.与此图相似,可在此链接中找到

为实现这一目标,我创建了 staff_availability_df ,其中包含要选择的员工数量,该数量可在 ['Person'] 列中找到.他们可以在 min-max 小时内工作,并获得多少报酬.他们可以工作的可用时间被分成小时['Availability_Hr'],代表他们可以工作的时间,以小时表示.因此,第一人称是'8-18',即 8:00:00 am-18:00:00 pm . ['Availability_15min_Seg'] 基本相同,但小时数分为4个部分.因此,第一个人是'1-41',又是 8:00:00 am-18:00:00 pm .

注意:标准班次在 8:00:00 am-3:30:00 am 之间运行,因此大约需要20个小时.

staff_requirements_df 显示整个班次中的Time 以及我需要的所需People.

 将pandas导入为pd导入matplotlib.pyplot作为plt导入matplotlib.dates作为日期#这是员工的可用性:staff_availability = pd.DataFrame({'Person':['C1','C2','C3','C4','C5','C6','C7','C8','C9','C10','C11'],'MinHours':[5,5,5,5,5,5,5,5,5,5,5,5],'MaxHours':[10,10,10,10,10,10,10,10,10,10,10,10],小时工资":[26,26,26,26,26,26,26,26,26,26,26,26],'Availability_Hr':['8-18','8-18','8-18','9-18','9-18','9-18','12-1','12-1','17-3','17-3','17-3'],'Availability_15min_Seg':['1-41','1-41','1-41','5-41','5-41','5-41','17-69','17-69','37-79','37-79','37-79'],})#以下是人员配备要求:Staffing_requirements = pd.DataFrame({'时间':['0/1/1900 8:00:00','0/1/1900 9:59:00','0/1/1900 10:00:00','0/1/190012:29:00','0/1/1900 12:30:00','0/1/1900 13:00:00','0/1/1900 13:02:00','0/1/1900 13:15:00','0/1/1900 13:20:00','0/1/1900 18:10:00','0/1/1900 18:15:00','0/1/1900 18:20:00','0/1/1900 18:25:00','0/1/1900 18:45:00','0/1/1900 18:50:00','0/1/1900 19:05:00','0/1/1900 19:07:00','0/1/1900 21:57:00','0/1/1900 22:00:00','0/1/1900 22:30:00','0/1/1900 22:35:00','1/1/1900 3:00:00','1/1/1900 3:05:00','1/1/1900 3:20:00','1/1/1900 3:25:00'],人":[1、2、2、3、3、2、2、3、3、4、4、3、3、2、2、3、3、4、4、3、3、2,2,1],}) 

我已使用以下功能在 8:00:00 am-3:30:00 am 之间的15分钟内导出了人员需求.每15分钟分配给 string 'T'.因此 T1 = 8:00:00 am T79 = 3:00:00 am

  staffing_requirements ['Time'] = ['/'.join([str(int(x.split('/')[0])+1)] + x.split('/')[1:])for staffing_requirements ['Time']]中的xstaffing_requirements ['Time'] = pd.to_datetime(staffing_requirements ['Time'],格式='%d/%m/%Y%H:%M:%S')formatter = date.DateFormatter('%Y-%m-%d%H:%M:%S')staffing_requirements = staffing_requirements.groupby(pd.Grouper(freq ='15T',key ='Time'))['People'].max().ffill()Staffing_requirements = staffing_requirements.reset_index(level=['Time'])staffing_requirements.insert(2,'T',range(1,1 + len(staffing_requirements))))staffing_requirements ['T'] ='T'+ staffing_requirements ['T'].astype(str)st_req = staffing_requirements ['People'].tolist()[1.0、1.0、1.0、1.0、1.0、1.0、1.0、1.0、2.0、2.0、2.0、2.0、2.0、2.0、2.0、2.0、2.0、2.0、3.0、3.0、3.0、3.0、3.0、3.0、3.0,3.0、3.0、3.0、3.0、3.0、3.0、3.0、3.0、3.0、3.0、3.0、3.0、3.0、3.0、3.0、3.0、4.0、4.0、3.0、3.0、3.0、3.0、3.0、3.0、3.0、3.0、3.0、3.0、3.0、3.0、4.0、4.0、4.0、3.0、3.0、3.0、3.0、3.0、3.0、3.0,3.0、3.0、3.0、3.0、3.0、3.0、4.0、4.0、4.0、4.0、4.0、4.0、4.0、4.0、4.0、4.0、4.0、4.0、4.0、4.0、4.0、4.0、4.0、4.0、4.0,4.0、3.0、2.0] 

我希望使用这些功能创建一个线性规划矩阵,该矩阵返回每个员工上班时间的分布.但我希望使用15分钟的时间段和小时.例如注意:此导出将扩展到 3:30 am .因此它将包含79个细分.

注意:要清楚.我希望返回分发时间表,以便将来使用.不只是一个数字.

有少量员工可供使用
您可以看到在最后几个时间段内需要的员工比可用的员工多.该图是由以下人员创建的:

  fig,ax = plt.subplots()staffing_requirements.plot(y ='People',ax = ax,label ='Required',drawstyle ='steps-mid')availability_per_member.groupby(level='Timeslot')['Available'].sum().plot(ax=ax,label ='Available',drawstyle ='steps-mid')plt.legend() 

I'm hoping to create a distribution that displays the times an employee is available to work. Similar to this figure, which is found at this link staff distribution.

To achieve this, I've created the staff_availability_df that contains the number of employees to pick from, which is found in the ['Person'] column. The min - max hours they can work, how much they get paid are labelled as such. The available times they can work are separated into hours ['Availability_Hr'], which represents the times they can work expressed in hours. So the first person is '8-18', which is 8:00:00am - 18:00:00pm. The ['Availability_15min_Seg'] is essentially the same but hours are split up into 4 segments. So the first person is '1-41', which is again 8:00:00am - 18:00:00pm.

Note: The standard shift operates between 8:00:00am - 3:30:00am, so approx 20 hours.

The staff_requirements_df displays the Time throughout the shift and the required People I need.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dates

#This is the employee availability:
staff_availability = pd.DataFrame({
    'Person' : ['C1','C2','C3','C4','C5','C6','C7','C8','C9','C10','C11'],                 
    'MinHours' : [5,5,5,5,5,5,5,5,5,5,5],    
    'MaxHours' : [10,10,10,10,10,10,10,10,10,10,10],                 
    'HourlyWage' : [26,26,26,26,26,26,26,26,26,26,26],  
    'Availability_Hr' : ['8-18','8-18','8-18','9-18','9-18','9-18','12-1','12-1','17-3','17-3','17-3'],                              
    'Availability_15min_Seg' : ['1-41','1-41','1-41','5-41','5-41','5-41','17-69','17-69','37-79','37-79','37-79'],                              
    }) 

#These are the staffing requirements:
staffing_requirements = pd.DataFrame({
    'Time' : ['0/1/1900 8:00:00','0/1/1900 9:59:00','0/1/1900 10:00:00','0/1/1900 12:29:00','0/1/1900 12:30:00','0/1/1900 13:00:00','0/1/1900 13:02:00','0/1/1900 13:15:00','0/1/1900 13:20:00','0/1/1900 18:10:00','0/1/1900 18:15:00','0/1/1900 18:20:00','0/1/1900 18:25:00','0/1/1900 18:45:00','0/1/1900 18:50:00','0/1/1900 19:05:00','0/1/1900 19:07:00','0/1/1900 21:57:00','0/1/1900 22:00:00','0/1/1900 22:30:00','0/1/1900 22:35:00','1/1/1900 3:00:00','1/1/1900 3:05:00','1/1/1900 3:20:00','1/1/1900 3:25:00'],                 
    'People' : [1,1,2,2,3,3,2,2,3,3,4,4,3,3,2,2,3,3,4,4,3,3,2,2,1],                      
     })

I have exported the staffing requirements in 15min segments that occur between 8:00:00am - 3:30:00am using the following functions. Each 15min is assigned to the string 'T'. So T1 = 8:00:00am and T79 = 3:00:00am

staffing_requirements['Time'] = ['/'.join([str(int(x.split('/')[0])+1)] + x.split('/')[1:]) for x in staffing_requirements['Time']]
staffing_requirements['Time'] = pd.to_datetime(staffing_requirements['Time'], format='%d/%m/%Y %H:%M:%S')
formatter = dates.DateFormatter('%Y-%m-%d %H:%M:%S') 

staffing_requirements = staffing_requirements.groupby(pd.Grouper(freq='15T',key='Time'))['People'].max().ffill()
staffing_requirements = staffing_requirements.reset_index(level=['Time'])

staffing_requirements.insert(2, 'T', range(1, 1 + len(staffing_requirements)))
staffing_requirements['T'] = 'T' + staffing_requirements['T'].astype(str)

st_req = staffing_requirements['People'].tolist()

[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 2.0]

I'm hoping to use these functions to create a linear programming matrix that returns a distribution of the times each employee is available to work. But I'm hoping to use 15min segments as well as hours. e.g. Note: This export will extend to 3:30am. So it will contain 79 segments.

Note: To be clear. I'm hoping to return the distribution schedule so it can be used for a future purpose. Not just a figure.

There are a few staff availability example 1 example 2 approaches using mixed-integer linear programming but they use closed source software. I'm hoping to translate this to Python.

解决方案

This is indeed a great job for integer programming; you could use pulp, which you would first need to install via the command line, e.g. pip install pulp

Data manipulation to set ourselves up for success

Then, first make sure your DataFrames are in optimal shape so we can attack the problem:

# Since timeslots for staffing start counting at 1, also make the
# DataFrame index start counting at 1
staffing_requirements.index = range(1, len(staffing_requirements) + 1) 
print(staffing_requirements.tail())

staff_availability.set_index('Person')

staff_costs = staff_availability.set_index('Person')[['MinHours', 'MaxHours', 'HourlyWage']]
availability = staff_availability.set_index('Person')[['Availability_15min_Seg']]
availability[['first_15min', 'last_15min']] =  availability['Availability_15min_Seg'].str.split('-', expand=True).astype(int)

availability_per_member =  [pd.DataFrame(1, columns=[idx], index=range(row['first_15min'], row['last_15min']+1))
 for idx, row in availability.iterrows()]

availability_per_member = pd.concat(availability_per_member, axis='columns').fillna(0).astype(int).stack()
availability_per_member.index.names = ['Timeslot', 'Person']
availability_per_member = (availability_per_member.to_frame()
                            .join(staff_costs[['HourlyWage']])
                            .rename(columns={0: 'Available'}))

where availability_per_member is now a MultiIndex DataFramethat has one row per person per timeslot, indicating his/her availability and wage:

#                 Available  HourlyWage
#Timeslot Person                       
#1        C1              1          26
#         C2              1          26
#         C3              1          26
#         C4              0          26
#         C5              0          26

Additionally, we change the requisites a bit so the problem is actually solvable; see the appendix for why this is necessary

import numpy as np
np.random.seed(42)
staffing_requirements['People'] = np.random.randint(1, 4, size=len(staffing_requirements))
staff_costs['MinHours'] = 3

Solve the Integer Programming problem with pulp

Now, we can get pulp to work: set up the problem with the goal of minimizing the costs, and adding the constraints you mentioned one-by-one, see the commented code. staffed is now a pulp-dictionary containing whether a person is staffed at a certain time slot (either 0 or 1)

import pulp
prob = pulp.LpProblem('CreateStaffing', pulp.LpMinimize) # Minimize costs

timeslots = staffing_requirements.index
persons = availability_per_member.index.levels[1]

# A member is either staffed or is not at a certain timeslot
staffed = pulp.LpVariable.dicts("staffed",
                                   ((timeslot, staffmember) for timeslot, staffmember 
                                    in availability_per_member.index),
                                     lowBound=0,
                                     cat='Binary')

# Objective = cost (= sum of hourly wages)                              
prob += pulp.lpSum(
    [staffed[timeslot, staffmember] * availability_per_member.loc[(timeslot, staffmember), 'HourlyWage'] 
    for timeslot, staffmember in availability_per_member.index]
)

# Staff the right number of people
for timeslot in timeslots:
    prob += (sum([staffed[(timeslot, person)] for person in persons]) 
    == staffing_requirements.loc[timeslot, 'People'])


# Do not staff unavailable persons
for timeslot in timeslots:
    for person in persons:
        if availability_per_member.loc[(timeslot, person), 'Available'] == 0:
            prob += staffed[timeslot, person] == 0

# Do not underemploy people
for person in persons:
    prob += (sum([staffed[(timeslot, person)] for timeslot in timeslots])
    >= staff_costs.loc[person, 'MinHours']*4) # timeslot is 15 minutes => 4 timeslots = hour

# Do not overemploy people
for person in persons:
    prob += (sum([staffed[(timeslot, person)] for timeslot in timeslots])
    <= staff_costs.loc[person, 'MaxHours']*4) # timeslot is 15 minutes => 4 timeslots = hour

And then, it's a matter of letting pulp solve the case:

prob.solve()
print(pulp.LpStatus[prob.status])

output = []
for timeslot, staffmember in staffed:
    var_output = {
        'Timeslot': timeslot,
        'Staffmember': staffmember,
        'Staffed': staffed[(timeslot, staffmember)].varValue,
    }
    output.append(var_output)
output_df = pd.DataFrame.from_records(output)#.sort_values(['timeslot', 'staffmember'])
output_df.set_index(['Timeslot', 'Staffmember'], inplace=True)
if pulp.LpStatus[prob.status] == 'Optimal':
    print(output_df)

This will now return a DataFrame output_df which per timeslot and person contains whether they are staffed:

#                      Staffed
#Timeslot Staffmember         
#1        C1               1.0
#         C2               1.0
#         C3               1.0
#         C4               0.0
#         C5               0.0
#         C6               0.0
#         C7               0.0
#         C8               0.0
#         C9               0.0
#         C10              0.0
#         C11              0.0
#2        C1               1.0
#         C2               1.0

I've adapted the code of http://benalexkeen.com/linear-programming-with-python-and-pulp-part-5/ which is a great tutorial to pulp and linear programming, so be sure to check it out.

Appendix: your requirements are infeasible.

With your conditions, this will actually return 'Infeasible'. It's easy to see why this is:
You can see there are more staff members required than available in the last few timeslots. This plot was created by:

fig, ax = plt.subplots()
staffing_requirements.plot(y='People', ax=ax, label='Required', drawstyle='steps-mid')
availability_per_member.groupby(level='Timeslot')['Available'].sum().plot(ax=ax, 
                               label='Available', drawstyle='steps-mid')
plt.legend()

这篇关于创建可用值的分布-Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆