如何根据条件从列表[Map]创建数据框 [英] How to create dataframe from list[Map] based on condition
问题描述
我有一个名为 DF1 的数据框,如下所示.
I have a dataframe called DF1 like below.
DF1:
srcColumnZ|srcCoulmnY|srcCoulmnR|
+---------+----------+----------+
|John |Non Hf |New york |
|Steav |Non Hf |Mumbai |
|Ram |HF |Boston |
还有一个映射列表,其中包含源到目标列的映射,如下所示.
And also having one list of map with source to target column mapping like below.
List(Map(targetColumn -> columnNameX, sourceColumn -> List(srcColumnX, srcColumnY, srcColumnZ, srcColumnP, srcColumnQ, srcColumnR)), Map(targetColumn -> columnNameY, sourceColumn -> List(srcColumnY)), Map(targetColumn -> columnNameZ, selectvalue -> 5))
我想根据上面的 Map 列表创建一个数据框,在该数据框中我需要 columnNameX、columnNameY、columnNameZ 作为一列(根据上面的列表),这些列的值将基于 sourceColumn 即如果sourceColumn 像 List(srcColumnX, srcColumnY, srcColumnZ, srcColumnP, srcColumnQ, srcColumnR)) 一样存在,然后它将在 DF1 中一一检查所有列,并且每当第一列匹配时,它将将该列的所有值移动到目标列中并下一个目标列相同.如果存在 selectvalue 而不是源列,它将将该值硬编码到整个列中.即:在上面的目标列(columnNameZ)列表中存在选择值 5
I wanted to create a dataframe based on the above list of Map and in that data frame I need columnNameX, columnNameY, columnNameZ as a column(according to above list) and the value of these column will be based on sourceColumn i.e. if sourceColumn is present like List(srcColumnX, srcColumnY, srcColumnZ, srcColumnP, srcColumnQ, srcColumnR)) then it will check all the column one by one in DF1 and whenever 1st column will match it will move all the values of that column into target column and same for next target column. And in case selectvalue present instead of source column it will hardcode that value into entire column. ie: in above list for target column(columnNameZ) selectvalue is present 5
以下是预期的输出.
columnNameX|columnNameY|columnNameZ|
+----------+-----------+-----------+
|John |Non Hf |5 |
|Steav |Non Hf |5 |
|Ram |HF |5 |
推荐答案
这里的主要内容是从给定的 map
生成 query
list
> 你可以像下面那样做
The main thing here is that generate a query
list
from given map
that you can do like below
//Input DF
val df=Seq(("John","Non Hf","New york"),("Steav","Non Hf","Mumbai"),("Ram","HF","Boston")).toDF("srcColumnZ", "srcColumnY", "srcColumnR")
//Input List
val mapList=List(Map("targetColumn" -> "columnNameX", "sourceColumn" -> List("srcColumnX", "srcColumnY", "srcColumnZ", "srcColumnP", "srcColumnQ", "srcColumnR")), Map("targetColumn" -> "columnNameY", "sourceColumn" -> List("srcColumnY")), Map("targetColumn" -> "columnNameZ", "selectvalue" -> 5))
//Get all the columns of df as list
val dfCols=df.columns.toList
//Then generate query list like below
val query = mapList.map { mp =>
if (mp.contains("sourceColumn")) {
val srcColumn = mp.getOrElse("sourceColumn", "sourceColumn key not found").toString.replace("List(", "").replace(")", "").split(",").map(_.trim).toList
val srcCol = srcColumn.filter(dfCols.contains(_)).head
df.col(srcCol.toString).alias(mp.getOrElse("targetColumn", "No Target column found").toString)
} else {
lit(mp.getOrElse("selectvalue", "No Target column found").toString.replace("(", "").replace(")", "").trim).alias(mp.getOrElse("targetColumn", "No Target column found").toString)
}
}
//Finally , fire the query
df.select(query:_*).show
//Sample output:
+-----------+-----------+-----------+
|columnNameX|columnNameY|columnNameZ|
+-----------+-----------+-----------+
| Non Hf| Non Hf| 5|
| Non Hf| Non Hf| 5|
| HF| HF| 5|
+-----------+-----------+-----------+
这篇关于如何根据条件从列表[Map]创建数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!