如何通过机器学习预测员工任务的结束日期 [英] How to Predict Employee task End_Date through machine-learning
问题描述
如何在下面进行预测以及哪种算法最合适.
How to predict below and which algorithm is the best suit.
员工的工作活动开始日期&结束日期(列). 工作表中还有其他几列,例如Work_Complexity(High& Low),否.每个活动的子任务.
Employee has Work Activity Start_Date & End_Date (Columns). Sheet has few other columns such as Work_Complexity (High & Low) , no. of sub-tasks for each activity.
如何预测开始日期的工作活动结束日期?必须使用哪种ML算法?
How to predict Work Activity End_Date for a Start_Date? Which ML Algorithm has to be used ?
这是否可以视为现实用例?
Is this can be considered as a realistic use case ?
谢谢!
推荐答案
是的,这是一个实际的用例.
Yes, this is a realistic use case.
如果您有带标签的数据工具,那么您将拥有一张工作表,其中已知现有任务的员工开始日期和结束日期,现在您要预测任何新任务的结束日期,则可以将线性回归与多个变量一起使用. 有关与具有多个变量的线性回归有关的更多信息,请通过以下链接: https://www.investopedia.com/terms/m/mlr.asp
If you have a labelled data means, you have a sheet where employee start date and end date is known for existing tasks and now you want to predict the end date for any new task, you can use Linear Regression with multiple variable. For more info related to Linear Regression with multiple variable, go through this link: https://www.investopedia.com/terms/m/mlr.asp
无论如何,不要对此理论感到困惑.简而言之,线性回归是一种对变量(列)之间的关系进行建模的方法.具有一个变量的线性回归意味着,您试图仅使用一个变量(列)来预测结束日期,即您的情况下的开始日期.如果您要使用多个变量(列)来预测结束日期,即开始日期,任务的复杂性,子任务等;您必须使用具有多个变量的线性回归.我正在使用房屋价格预测模型.
Anyway, don't get much confused in that theory. In simple terms, Linear Regression is an approach to modelling a relationship between the variables (columns). Linear Regression with one variable means, you are trying to predict the end date with only using one variable(column) i.e. start date in your case. If you want to predict the end date with using more than one variable(columns) i.e. start date, complexity of task, sub-task etc; you have to use Linear Regression with multiple variable. I am using House Price Prediction model.
下面是使用python使用一个变量实现线性回归的方法,其中我们将仅使用一个变量来预测房价:
Below is the Implementation of Linear Regression with one variable using python, where we will predict the house price using only one variable:
import pandas as pd #used for uploading your datasets #you have to import machine learning libraries
import numpy as np #for array
from sklearn import linear_model #for prediction
df = pd.read_csv('/content/MLPractical2 - Sheet1.csv') #you need to upload your file
df
输出:我上传的文件包含以下数据
Output: File which I have uploaded, contains following data
面积||价格
2600 || 555000
2600 || 555000
3000 || 565000
3000 || 565000
3200 || 610000
3200 || 610000
3600 || 680000
3600 || 680000
4000 || 725000
4000 || 725000
让我们对面积为3601的房价进行预测:
Let's make a prediction of house price which is having area 3601:
reg = linear_model.LinearRegression()
reg.fit(df[['Area']], df.Price)
reg.predict([[3601]])
输出:array([669653.42465753])
Output : array([669653.42465753])
我们仅根据一个变量(列)即面积来预测价格
We are predicting price on basis of only one variable(column) i.e Area
正如您在我上传的文件中所观察到的,拥有面积3600的房屋的价格为680000,而我们的算法预测的面积3601的价格为669653.42465753,这非常接近.
As you can observe in file which i have uploaded, Price of House having area 3600 is 680000 and price which our algorithm is predicting for area 3601 is 669653.42465753 which is very close.
让我们看一下使用python使用多个变量实现线性回归的实现;我们将使用多个变量来预测房价
Let's look at the implementation of Linear Regression with multiple variable using python; where we'll use multiple variable to predict our house price
import pandas as pd #same as above
import numpy as np
from sklearn import linear_model
df = pd.read_csv('/content/ML_Sheet_2.csv')
df
输出:在这种情况下,我上载的文件包含以下数据
Output: File which I have uploaded in this case contains following data
面积||卧室||年龄||价格
Area || Bedroooms || Age || Price
2600 || 3.0 || 20 || 550000
2600 || 3.0 || 20 || 550000
3000 || 4.0 || 15 || 565000
3000 || 4.0 || 15 || 565000
3200 || 3.0 || 18 || 610000
3200 ||3.0 ||18 || 610000
3600 || 3.0 || 30 || 595000
3600 || 3.0 || 30 || 595000
4000 || 5.0 || 8 || 760000
4000 || 5.0 || 8 || 760000
让我们对房价进行预测,该地区的房价为3500、3个卧室和10岁
Let's make a prediction of house price which is having area 3500, 3 bedrooms and 10 years old
reg = linear_model.LinearRegression()
reg.fit(df[['Area', 'Bedroooms', 'Age']], df.Price)
reg.predict([[3500, 3, 10]])
输出:数组([717775])
Output: array([717775])
我们根据三个变量来预测房价,即面积,卧室数和房屋年龄.
We are predicting the house price on the basis of three variable i.e. Area, Number od bedrooms and Age of House.
正如您在我上传的文件中看到的那样,拥有3200、3个卧室和18岁的房屋的价格为610000,我们的算法预测的面积为3500(大于3200),3个卧室和10的房屋的价格岁是717775,这非常接近并且可以理解,因为我们预测的房屋面积大于3200,且年龄(新房价格更高)少于18.
As you can observe in the file which I have uploaded, Price of House having area 3200, 3 bedrooms and 18 years old is 610000 and price which our algorithm is predicting for area 3500(more than 3200), 3 bedrooms and 10 years old is 717775 which is very close and understandable because we are predicting for house which is having more area than 3200 and less age(New house has more price) than 18.
类似地,您也可以准备一份现有数据的excel表并将其保存为.csv格式,然后像我一样继续进行操作.我正在使用google colab编写我的代码;我希望您使用相同的内容:
Similarly, you can also prepare a excel sheet of your existing data and save it in .csv format and proceed further as I did. I am using google colab for writing my code; I prefer you to use the same:
https://colab.research.google.com/notebooks /intro.ipynb#recent=true
希望这对您有所帮助!
这篇关于如何通过机器学习预测员工任务的结束日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!