如何使用Spark和Java为下面的示例输入获取动态数据集的转置 [英] How to get transpose of dynamic dataset for below sample input using Spark and Java

查看:51
本文介绍了如何使用Spark和Java为下面的示例输入获取动态数据集的转置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,我想始终使用Spark和Java将列(动态列数)转置为两行.

I have one dataset and I want to transpose the columns (dynamic number of columns) into two rows always using Spark and Java.

样本输入:

+-------+-------+---------+
|titanic|IronMan|Juglebook|
+-------+-------+---------+
|    101|  test1|       10|
|    102|  test2|       20|
|    103|  test3|       30|
+-------+-------+---------+

样本输出:

|    Colname|colvalue       
+---------+----+----+---------+     
|   titanic| 101,102,103      |     
|  IronMan | test1,test2,test3|     
|Juglebook |  10,20,30        |     
+-------+-------+-------------+

我尝试使用spark sql,但是它已经变得硬编码了.

I tried with spark sql but it's becoming hardcoded.

推荐答案

考虑将列转换为行的请求,您可能会遇到的一个问题是,值必须以字符串形式而不是Int形式.首先,您需要将所有值都转换为字符串.假设在这里完成了这一部分,便是如何摆放和使用struct来获得所需的内容

Considering your request for transposing the columns to rows one issue you might face is that your values needs to be in string and not in Int. first you need to cast all of your values to string. assuming that part is done here is how you can trnapose and use struct to get to what you want

下面是它的Scala实现

Below is a Scala implementation of it

 Import org.apache.spark.sql.funtions._
def transpose(transDF:DataFrame) :DataFrame ={
cols1= transDF.dtypes.unzip
cols2= cols1._1
val KVS = explode(
array(cols2.map(c =>struct(lit(c).alias("column_name"), col(c).alias("column_Value"))
):_*))
transDF.Select(kvs.alias("_kvs"))
}

您可以从您的主站调用该函数,这将返回转置的列.然后,您可以只使用groupBy和Agg来获取所需格式的数据.

You can call the function from your main this will return the transposed columns. Then you can just use groupBy and Agg to get the data in your desired format.

这篇关于如何使用Spark和Java为下面的示例输入获取动态数据集的转置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆