在Spark Dataframe中进行分析 [英] Analytic in Spark Dataframe

查看:81
本文介绍了在Spark Dataframe中进行分析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在此问题中,我们有两个经理M1和M2,在经理M1的团队中有两个雇员e1& e2和M2团队中有两名员工e4& e5以下是经理和员工层次结构,

In this problem we have two manager M1 and M2 , In team of manager M1 have two employee e1 & e2 and in team of M2 have two employee e4 & e5 Following is the Manager and Employee Hierarchy,


1)  M1
  a.    e1
  b.    e2

2)  M2
  a.    e4
  b.    e5

我们有以下员工,薪水数据帧

And we have following employee, salary dataframe


+------+--------+------+---------+
|emp_id|month_id|salary|work_days|
+------+--------+------+---------+
|e1    |1       |66000 |22       |
|e1    |2       |48000 |16       |
|e1    |3       |87000 |29       |
|e2    |1       |75000 |25       |
|e2    |4       |69000 |23       |
|e2    |5       |66000 |22       |
|e4    |1       |90000 |30       |
|e4    |2       |87000 |29       |
|e5    |3       |72000 |24       |
|e5    |1       |57000 |19       |
|e5    |4       |51000 |17       |
|e5    |5       |69000 |23       |
+------+--------+------+---------+

使用以下规则查找新数据框

Find new dataframe with following rules

规则1-经理可以看到他的团队的工作日

Rule 1- Manager can see work_days of his team

规则2 –员工可以看到他的工作日和工资

Rule 2 – Employee can see his work_days and salary

推荐答案

根据我从您的问题中了解的内容,这是我建议您执行的操作.

According to what I understood from your question, here's what I suggest you to do.

首先,您需要创建管理人员的数据框,其中的雇员为

First you need to create dataframes of managers with employees under them as

manager1

+---+------+
|sn |emp_id|
+---+------+
|a  |e1    |
|b  |e2    |
+---+------+

manager2

+---+------+
|sn |emp_id|
+---+------+
|a  |e4    |
|b  |e5    |
+---+------+

然后,您应该编写一个函数,该函数将返回经理下的雇员列表为

Then you should write a function that will return a list of employees under a manager as

import org.apache.spark.sql.functions._
def getEmployees(df : DataFrame): List[String] = {
  df.select(collect_list("emp_id")).first().getAs[mutable.WrappedArray[String]](0).toList
}

最后一步是编写一个函数,该函数将仅过滤传递为的员工

The final step is to write a function that will filter only the employees passed as

def getEmployeeDetails(df: DataFrame, list: List[String]) : DataFrame ={
  df.filter(df("emp_id").isin(list: _*))
}

现在,如果您要查看在manager1(m1)下的员工,则

now if you want to see employees under manager1(m1) then

getEmployeeDetails(df, getEmployees(m1)).show(false)

将返回您

+------+--------+------+---------+
|emp_id|month_id|salary|work_days|
+------+--------+------+---------+
|e1    |1       |66000 |22       |
|e1    |2       |48000 |16       |
|e1    |3       |87000 |29       |
|e2    |1       |75000 |25       |
|e2    |4       |69000 |23       |
|e2    |5       |66000 |22       |
+------+--------+------+---------+

您也可以对其他经理进行同样的操作

you can do the same for other managers too

您也可以为员工做同样的事情

you can do the same for employees too as

getEmployeeDetails(df, List("e1")).show(false)

将返回employee1(e1)的数据框

will return the dataframe of employee1 (e1)

+------+--------+------+---------+
|emp_id|month_id|salary|work_days|
+------+--------+------+---------+
|e1    |1       |66000 |22       |
|e1    |2       |48000 |16       |
|e1    |3       |87000 |29       |
+------+--------+------+---------+

我希望答案会有所帮助

这篇关于在Spark Dataframe中进行分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆