使用scala在Spark中转置不聚合的DataFrame [英] Transpose DataFrame Without Aggregation in Spark with scala

查看:190
本文介绍了使用scala在Spark中转置不聚合的DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在网上看了许多不同的解决方案,但没有找到我要达到的目标. 请帮我.

I looked number different solutions online, but count not find what I am trying to achine. Please help me on this.

我将Apache Spark 2.1.0与Scala一起使用.下面是我的数据框:

I am using Apache Spark 2.1.0 with Scala. Below is my dataframe:


+-----------+-------+
|COLUMN_NAME| VALUE |
+-----------+-------+
|col1       | val1  |
|col2       | val2  |
|col3       | val3  |
|col4       | val4  |
|col5       | val5  |
+-----------+-------+

我希望将其转置为以下内容:

I want this to be transpose to, as below:


+-----+-------+-----+------+-----+
|col1 | col2  |col3 | col4 |col5 |
+-----+-------+-----+------+-----+
|val1 | val2  |val3 | val4 |val5 |
+-----+-------+-----+------+-----+

推荐答案

如果您的数据帧足够小(在问题中如此),那么您可以收集COLUMN_NAME形成架构收集值以形成行,然后创建新数据框

If your dataframe is small enough as in the question, then you can collect COLUMN_NAME to form schema and collect VALUE to form the rows and then create a new dataframe as

import org.apache.spark.sql.functions._
import org.apache.spark.sql.Row
//creating schema from existing dataframe
val schema = StructType(df.select(collect_list("COLUMN_NAME")).first().getAs[Seq[String]](0).map(x => StructField(x, StringType)))
//creating RDD[Row] 
val values = sc.parallelize(Seq(Row.fromSeq(df.select(collect_list("VALUE")).first().getAs[Seq[String]](0))))
//new dataframe creation
sqlContext.createDataFrame(values, schema).show(false)

应该给您

+----+----+----+----+----+
|col1|col2|col3|col4|col5|
+----+----+----+----+----+
|val1|val2|val3|val4|val5|
+----+----+----+----+----+

这篇关于使用scala在Spark中转置不聚合的DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆