如何通过机器学习预测员工任务的结束日期 [英] How to Predict Employee task End_Date through machine-learning

查看:115
本文介绍了如何通过机器学习预测员工任务的结束日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在下面进行预测以及哪种算法最合适.

How to predict below and which algorithm is the best suit.

员工的工作活动开始日期&结束日期(列). 工作表中还有其他几列,例如Work_Complexity(High& Low),否.每个活动的子任务.

Employee has Work Activity Start_Date & End_Date (Columns). Sheet has few other columns such as Work_Complexity (High & Low) , no. of sub-tasks for each activity.

如何预测开始日期的工作活动结束日期?必须使用哪种ML算法?

How to predict Work Activity End_Date for a Start_Date? Which ML Algorithm has to be used ?

这是否可以视为现实用例?

Is this can be considered as a realistic use case ?

谢谢!

推荐答案

是的,这是一个实际的用例.

Yes, this is a realistic use case.

如果您有带标签的数据工具,那么您将拥有一张工作表,其中已知现有任务的员工开始日期和结束日期,现在您要预测任何新任务的结束日期,则可以将线性回归与多个变量一起使用. 有关与具有多个变量的线性回归有关的更多信息,请通过以下链接: https://www.investopedia.com/terms/m/mlr.asp

If you have a labelled data means, you have a sheet where employee start date and end date is known for existing tasks and now you want to predict the end date for any new task, you can use Linear Regression with multiple variable. For more info related to Linear Regression with multiple variable, go through this link: https://www.investopedia.com/terms/m/mlr.asp

无论如何,不​​要对此理论感到困惑.简而言之,线性回归是一种对变量(列)之间的关系进行建模的方法.具有一个变量的线性回归意味着,您试图仅使用一个变量(列)来预测结束日期,即您的情况下的开始日期.如果您要使用多个变量(列)来预测结束日期,即开始日期,任务的复杂性,子任务等;您必须使用具有多个变量的线性回归.我正在使用房屋价格预测模型.

Anyway, don't get much confused in that theory. In simple terms, Linear Regression is an approach to modelling a relationship between the variables (columns). Linear Regression with one variable means, you are trying to predict the end date with only using one variable(column) i.e. start date in your case. If you want to predict the end date with using more than one variable(columns) i.e. start date, complexity of task, sub-task etc; you have to use Linear Regression with multiple variable. I am using House Price Prediction model.

下面是使用python使用一个变量实现线性回归的方法,其中我们将仅使用一个变量来预测房价:

Below is the Implementation of Linear Regression with one variable using python, where we will predict the house price using only one variable:

import pandas as pd  #used for uploading your datasets #you have to import machine learning libraries
import numpy as np   #for array
from sklearn import linear_model  #for prediction

df = pd.read_csv('/content/MLPractical2 - Sheet1.csv')  #you need to upload your file
df

输出:我上传的文件包含以下数据

Output: File which I have uploaded, contains following data

面积||价格

2600 || 555000

2600 || 555000

3000 || 565000

3000 || 565000

3200 || 610000

3200 || 610000

3600 || 680000

3600 || 680000

4000 || 725000

4000 || 725000

让我们对面积为3601的房价进行预测:

Let's make a prediction of house price which is having area 3601:

reg = linear_model.LinearRegression()
reg.fit(df[['Area']], df.Price)
reg.predict([[3601]])

输出:array([669653.42465753])

Output : array([669653.42465753])

我们仅根据一个变量(列)即面积来预测价格

We are predicting price on basis of only one variable(column) i.e Area

正如您在我上传的文件中所观察到的,拥有面积3600的房屋的价格为680000,而我们的算法预测的面积3601的价格为669653.42465753,这非常接近.

As you can observe in file which i have uploaded, Price of House having area 3600 is 680000 and price which our algorithm is predicting for area 3601 is 669653.42465753 which is very close.

让我们看一下使用python使用多个变量实现线性回归的实现;我们将使用多个变量来预测房价

Let's look at the implementation of Linear Regression with multiple variable using python; where we'll use multiple variable to predict our house price

import pandas as pd                  #same as above
import numpy as np
from sklearn import linear_model
df = pd.read_csv('/content/ML_Sheet_2.csv')
df

输出:在这种情况下,我上载的文件包含以下数据

Output: File which I have uploaded in this case contains following data

面积||卧室||年龄||价格

Area || Bedroooms || Age || Price

2600 || 3.0 || 20 || 550000

2600 || 3.0 || 20 || 550000

3000 || 4.0 || 15 || 565000

3000 || 4.0 || 15 || 565000

3200 || 3.0 || 18 || 610000

3200 ||3.0 ||18 || 610000

3600 || 3.0 || 30 || 595000

3600 || 3.0 || 30 || 595000

4000 || 5.0 || 8 || 760000

4000 || 5.0 || 8 || 760000

让我们对房价进行预测,该地区的房价为3500、3个卧室和10岁

Let's make a prediction of house price which is having area 3500, 3 bedrooms and 10 years old

reg = linear_model.LinearRegression()
reg.fit(df[['Area', 'Bedroooms', 'Age']], df.Price)
reg.predict([[3500, 3, 10]])

输出:数组([717775])

Output: array([717775])

我们根据三个变量来预测房价,即面积,卧室数和房屋年龄.

We are predicting the house price on the basis of three variable i.e. Area, Number od bedrooms and Age of House.

正如您在我上传的文件中看到的那样,拥有3200、3个卧室和18岁的房屋的价格为610000,我们的算法预测的面积为3500(大于3200),3个卧室和10的房屋的价格岁是717775,这非常接近并且可以理解,因为我们预测的房屋面积大于3200,且年龄(新房价格更高)少于18.

As you can observe in the file which I have uploaded, Price of House having area 3200, 3 bedrooms and 18 years old is 610000 and price which our algorithm is predicting for area 3500(more than 3200), 3 bedrooms and 10 years old is 717775 which is very close and understandable because we are predicting for house which is having more area than 3200 and less age(New house has more price) than 18.

类似地,您也可以准备一份现有数据的excel表并将其保存为.csv格式,然后像我一样继续进行操作.我正在使用google colab编写我的代码;我希望您使用相同的内容:

Similarly, you can also prepare a excel sheet of your existing data and save it in .csv format and proceed further as I did. I am using google colab for writing my code; I prefer you to use the same:

https://colab.research.google.com/notebooks /intro.ipynb#recent=true

希望这对您有所帮助!

这篇关于如何通过机器学习预测员工任务的结束日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆