如何根据条件从列表[Map]创建数据框 [英] How to create dataframe from list[Map] based on condition

查看:36
本文介绍了如何根据条件从列表[Map]创建数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为 DF1 的数据框,如下所示.

I have a dataframe called DF1 like below.

DF1:

srcColumnZ|srcCoulmnY|srcCoulmnR| 
+---------+----------+----------+
|John     |Non Hf    |New york  |
|Steav    |Non Hf    |Mumbai    |
|Ram      |HF        |Boston    |

还有一个映射列表,其中包含源到目标列的映射,如下所示.

And also having one list of map with source to target column mapping like below.

List(Map(targetColumn -> columnNameX, sourceColumn -> List(srcColumnX, srcColumnY, srcColumnZ, srcColumnP, srcColumnQ, srcColumnR)), Map(targetColumn -> columnNameY, sourceColumn -> List(srcColumnY)), Map(targetColumn -> columnNameZ, selectvalue -> 5))

我想根据上面的 Map 列表创建一个数据框,在该数据框中我需要 columnNameX、columnNameY、columnNameZ 作为一列(根据上面的列表),这些列的值将基于 sourceColumn 即如果sourceColumn 像 List(srcColumnX, srcColumnY, srcColumnZ, srcColumnP, srcColumnQ, srcColumnR)) 一样存在,然后它将在 DF1 中一一检查所有列,并且每当第一列匹配时,它将将该列的所有值移动到目标列中并下一个目标列相同.如果存在 selectvalue 而不是源列,它将将该值硬编码到整个列中.即:在上面的目标列(columnNameZ)列表中存在选择值 5

I wanted to create a dataframe based on the above list of Map and in that data frame I need columnNameX, columnNameY, columnNameZ as a column(according to above list) and the value of these column will be based on sourceColumn i.e. if sourceColumn is present like List(srcColumnX, srcColumnY, srcColumnZ, srcColumnP, srcColumnQ, srcColumnR)) then it will check all the column one by one in DF1 and whenever 1st column will match it will move all the values of that column into target column and same for next target column. And in case selectvalue present instead of source column it will hardcode that value into entire column. ie: in above list for target column(columnNameZ) selectvalue is present 5

以下是预期的输出.

columnNameX|columnNameY|columnNameZ| 
+----------+-----------+-----------+
|John      |Non Hf     |5          |
|Steav     |Non Hf     |5          |
|Ram       |HF         |5          |

推荐答案

这里的主要内容是从给定的 map 生成 query list> 你可以像下面那样做

The main thing here is that generate a query list from given map that you can do like below

//Input DF
val df=Seq(("John","Non Hf","New york"),("Steav","Non Hf","Mumbai"),("Ram","HF","Boston")).toDF("srcColumnZ", "srcColumnY", "srcColumnR")

//Input List

val mapList=List(Map("targetColumn" -> "columnNameX", "sourceColumn" -> List("srcColumnX", "srcColumnY", "srcColumnZ", "srcColumnP", "srcColumnQ", "srcColumnR")), Map("targetColumn" -> "columnNameY", "sourceColumn" -> List("srcColumnY")), Map("targetColumn" -> "columnNameZ", "selectvalue" -> 5))

//Get all the columns of df as list

val dfCols=df.columns.toList

//Then generate query list like below

val query = mapList.map { mp =>
            if (mp.contains("sourceColumn")) {
                val srcColumn = mp.getOrElse("sourceColumn", "sourceColumn key not found").toString.replace("List(", "").replace(")", "").split(",").map(_.trim).toList
                val srcCol = srcColumn.filter(dfCols.contains(_)).head
                df.col(srcCol.toString).alias(mp.getOrElse("targetColumn", "No Target column found").toString)
            } else {
                lit(mp.getOrElse("selectvalue", "No Target column found").toString.replace("(", "").replace(")", "").trim).alias(mp.getOrElse("targetColumn", "No Target column found").toString)
            }
        }

//Finally , fire the query

df.select(query:_*).show

//Sample output:

+-----------+-----------+-----------+
|columnNameX|columnNameY|columnNameZ|
+-----------+-----------+-----------+
|     Non Hf|     Non Hf|          5|
|     Non Hf|     Non Hf|          5|
|         HF|         HF|          5|
+-----------+-----------+-----------+

这篇关于如何根据条件从列表[Map]创建数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆