使用scala在Spark中转置不聚合的DataFrame [英] Transpose DataFrame Without Aggregation in Spark with scala
本文介绍了使用scala在Spark中转置不聚合的DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在网上看了许多不同的解决方案,但没有找到我要达到的目标. 请帮我.
I looked number different solutions online, but count not find what I am trying to achine. Please help me on this.
我将Apache Spark 2.1.0与Scala一起使用.下面是我的数据框:
I am using Apache Spark 2.1.0 with Scala. Below is my dataframe:
+-----------+-------+
|COLUMN_NAME| VALUE |
+-----------+-------+
|col1 | val1 |
|col2 | val2 |
|col3 | val3 |
|col4 | val4 |
|col5 | val5 |
+-----------+-------+
我希望将其转置为以下内容:
I want this to be transpose to, as below:
+-----+-------+-----+------+-----+
|col1 | col2 |col3 | col4 |col5 |
+-----+-------+-----+------+-----+
|val1 | val2 |val3 | val4 |val5 |
+-----+-------+-----+------+-----+
推荐答案
如果您的数据帧足够小(在问题中如此),那么您可以收集COLUMN_NAME形成架构并收集值以形成行,然后创建新数据框为
If your dataframe is small enough as in the question, then you can collect COLUMN_NAME to form schema and collect VALUE to form the rows and then create a new dataframe as
import org.apache.spark.sql.functions._
import org.apache.spark.sql.Row
//creating schema from existing dataframe
val schema = StructType(df.select(collect_list("COLUMN_NAME")).first().getAs[Seq[String]](0).map(x => StructField(x, StringType)))
//creating RDD[Row]
val values = sc.parallelize(Seq(Row.fromSeq(df.select(collect_list("VALUE")).first().getAs[Seq[String]](0))))
//new dataframe creation
sqlContext.createDataFrame(values, schema).show(false)
应该给您
+----+----+----+----+----+
|col1|col2|col3|col4|col5|
+----+----+----+----+----+
|val1|val2|val3|val4|val5|
+----+----+----+----+----+
这篇关于使用scala在Spark中转置不聚合的DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文