转换行值成列数组火花数据框 [英] Converting row values into a column array in spark dataframe
本文介绍了转换行值成列数组火花数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我工作的火花dataframes,我需要通过一列做一组并分组行的列值转换成元素作为新列的数组。
例如:
I am working on spark dataframes and I need to do a group by of a column and convert the column values of grouped rows into an array of elements as new column. Example :
Input:
employee | Address
------------------
Micheal | NY
Micheal | NJ
Output:
employee | Address
------------------
Micheal | (NY,NJ)
任何帮助是非常AP preciated!
Any help is highly appreciated.!
推荐答案
下面是一个替代的解决方案
在那里我有转换的数据帧到RDD的转换和转换回用数据帧 sqlContext.createDataFrame()
Here is an alternate solution
Where I have converted the dataframe to an rdd for the transformations and converted it back a dataFrame using sqlContext.createDataFrame()
Sample.json
Sample.json
{"employee":"Michale","Address":"NY"}
{"employee":"Michale","Address":"NJ"}
{"employee":"Sam","Address":"NY"}
{"employee":"Max","Address":"NJ"}
星火应用
val df = sqlContext.read.json("sample.json")
// Printing the original Df
df.show()
//Defining the Schema for the aggregated DataFrame
val dataSchema = new StructType(
Array(
StructField("employee", StringType, nullable = true),
StructField("Address", ArrayType(StringType, containsNull = true), nullable = true)
)
)
// Converting the df to rdd and performing the groupBy operation
val aggregatedRdd: RDD[Row] = df.rdd.groupBy(r =>
r.getAs[String]("employee")
).map(row =>
// Mapping the Grouped Values to a new Row Object
Row(row._1, row._2.map(_.getAs[String]("Address")).toArray)
)
// Creating a DataFrame from the aggregatedRdd with the defined Schema (dataSchema)
val aggregatedDf = sqlContext.createDataFrame(aggregatedRdd, dataSchema)
// Printing the aggregated Df
aggregatedDf.show()
输出:
+-------+--------+---+
|Address|employee|num|
+-------+--------+---+
| NY| Michale| 1|
| NJ| Michale| 2|
| NY| Sam| 3|
| NJ| Max| 4|
+-------+--------+---+
+--------+--------+
|employee| Address|
+--------+--------+
| Sam| [NY]|
| Michale|[NY, NJ]|
| Max| [NJ]|
+--------+--------+
这篇关于转换行值成列数组火花数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文