在Spark Dataframe中进行分析 [英] Analytic in Spark Dataframe
问题描述
在此问题中,我们有两个经理M1和M2,在经理M1的团队中有两个雇员e1& e2和M2团队中有两名员工e4& e5以下是经理和员工层次结构,
In this problem we have two manager M1 and M2 , In team of manager M1 have two employee e1 & e2 and in team of M2 have two employee e4 & e5 Following is the Manager and Employee Hierarchy,
1) M1
a. e1
b. e2
2) M2
a. e4
b. e5
我们有以下员工,薪水数据帧
And we have following employee, salary dataframe
+------+--------+------+---------+
|emp_id|month_id|salary|work_days|
+------+--------+------+---------+
|e1 |1 |66000 |22 |
|e1 |2 |48000 |16 |
|e1 |3 |87000 |29 |
|e2 |1 |75000 |25 |
|e2 |4 |69000 |23 |
|e2 |5 |66000 |22 |
|e4 |1 |90000 |30 |
|e4 |2 |87000 |29 |
|e5 |3 |72000 |24 |
|e5 |1 |57000 |19 |
|e5 |4 |51000 |17 |
|e5 |5 |69000 |23 |
+------+--------+------+---------+
使用以下规则查找新数据框
Find new dataframe with following rules
规则1-经理可以看到他的团队的工作日
Rule 1- Manager can see work_days of his team
规则2 –员工可以看到他的工作日和工资
Rule 2 – Employee can see his work_days and salary
推荐答案
根据我从您的问题中了解的内容,这是我建议您执行的操作.
According to what I understood from your question, here's what I suggest you to do.
首先,您需要创建管理人员的数据框,其中的雇员为
First you need to create dataframes of managers with employees under them as
manager1
+---+------+
|sn |emp_id|
+---+------+
|a |e1 |
|b |e2 |
+---+------+
manager2
+---+------+
|sn |emp_id|
+---+------+
|a |e4 |
|b |e5 |
+---+------+
然后,您应该编写一个函数,该函数将返回经理下的雇员列表为
Then you should write a function that will return a list of employees under a manager as
import org.apache.spark.sql.functions._
def getEmployees(df : DataFrame): List[String] = {
df.select(collect_list("emp_id")).first().getAs[mutable.WrappedArray[String]](0).toList
}
最后一步是编写一个函数,该函数将仅过滤传递为的员工
The final step is to write a function that will filter only the employees passed as
def getEmployeeDetails(df: DataFrame, list: List[String]) : DataFrame ={
df.filter(df("emp_id").isin(list: _*))
}
现在,如果您要查看在manager1(m1)下的员工,则
now if you want to see employees under manager1(m1) then
getEmployeeDetails(df, getEmployees(m1)).show(false)
将返回您
+------+--------+------+---------+
|emp_id|month_id|salary|work_days|
+------+--------+------+---------+
|e1 |1 |66000 |22 |
|e1 |2 |48000 |16 |
|e1 |3 |87000 |29 |
|e2 |1 |75000 |25 |
|e2 |4 |69000 |23 |
|e2 |5 |66000 |22 |
+------+--------+------+---------+
您也可以对其他经理进行同样的操作
you can do the same for other managers too
您也可以为员工做同样的事情
you can do the same for employees too as
getEmployeeDetails(df, List("e1")).show(false)
将返回employee1(e1)的数据框
will return the dataframe of employee1 (e1)
+------+--------+------+---------+
|emp_id|month_id|salary|work_days|
+------+--------+------+---------+
|e1 |1 |66000 |22 |
|e1 |2 |48000 |16 |
|e1 |3 |87000 |29 |
+------+--------+------+---------+
我希望答案会有所帮助
这篇关于在Spark Dataframe中进行分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!